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A method for identifying, filtering, ranking and cataloging 
information elements; as for example, World Wide Web 
pages, of the Internet in whole, part, or in combination. The 
method is preferably implemented in computer software and 
features steps for enabling a user to interactively create an 
information database including preferred information ele- 
ments such as preferred World Wide Web pages in whole, 
part, or in combination. ,The method includes steps for 
enabling a user to interactively create a frame-based, hier- 
archical organizational structure for the information 
elements, and steps for identifying and automatically filter- 
ing and ranking by relevance, information elements, such as 
World Wide Web pages for populating the structure, to form; 
for example, a searchable, World Wide Web page database. 
Additionally, the method features steps for enabling a user 
to interactively define a frame -based, hierarchical informa- 
tion structure for cataloging information, identifying a pre- 
liminary population of information elements for a particular 
hierarchical category arranged as a frame, based upon the 
respective frame attributes, and thereafter, expanding the 
information population to include related information, and 
subsequently, automatically filtering and ranking the infor- 
mation based upon relevance, and then populating the hier- 
archical structure with the a definable portion of the filtered, 
ranked information elements. 
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METHOD FOR INTERACTIVELY CREATING 
AN INFORMATION DATABASE INCLUDING 
PREFERRED INFORMATION ELEMENTS, 
SUCH AS PREFERRED-AUTHORITY, 
WORLD WIDE WEB PAGES 

This application is a continuation-in-part of U.S. patent 
application Ser. No. 09A 43,733, filed Aug. 29, 1998, for an 
invention entitled "A METHOD FOR INTERACTIVELY 
CREATING AN INFORMATION DATABASE INCLUD- 
ING PREFERRED INFORMATION ELEMENTS SUCH 
AS PREFERRED-AUTHORITY, WORLD WIDE WEB 
PAGES", from which priority is claimed. 

BACKGROUND OF THE INVENTION 

FIELD OF USE 

This invention relates generally to a method for 
identifying, filtering, ranking and cataloging information 
elements; as for example, Internet, World Wide Web pages, 
considered in whole, in part, or in combination; and more 
particularly, to a method, preferably implemented in com- 
puter software, for interactively creating an information 
database including preferred information elements, the 
method including steps for enabling a user to interactively 
create a frame-based, hierarchical organizational structure 
for the information elements, and steps, thereafter, for 
identifying by iteration and automatically filtering and rank- 
ing by degree of relevance information elements, for popu- 
lating the frames of the structure to form; for example, a 
searchable, World Wide Web page database. In further 
detail, the method features steps for enabling a user to 
interactively define a frame-based, hierarchical information 
structure for cataloging information, and, steps for identi- 
fying information elements to populate respective frames of 
the structure by iteration, the iteration including steps for: 
identifying a preliminary population of information ele- 
ments with the use of a search query based on respective 
frame attributes, frame attributes selectively including clas- 
sification designations, example pages, stop pages and/or 
control parameters used by conventional search engines, as 
required; supplementing preliminary population based on 
usage of example pages and/or stop pages; expanding the 
supplemented preliminary population to include related 
information; automatically filtering and computing informa- 
tion element ranking based on degree of relevance to the 
respective frame; and, thereafter, refining the identification 
with successive iterations of the steps described until iden- 
tification is deemed complete, whereupon the hierarchical 
structure is populated with a user-defined portion of pre- 
ferred information elements identified. 

As a yet further problem, and potentially an even more 
perplexing one, not only has The computer revolution cre- 
ated a greater need for information, but, undeniably, it has 
created an abundance, indeed, an overabundance of infor- 
mation to meet that need. In fact, the computer revolution 
has spawned so much information, that it is now to the point 
where the amount of information available on most subjects 
is typically so large as to create the new and associated 
problems of going through that wealth of information, and 
selecting from it the items most relevant to the question at 
hand. 

For example, in the case of the Internet's World Wide 
Web, if one were looking for information concerning some- 
thing as straightforward as the restoration of an old car, there 
likely would be hundreds, if not thousands, of potential Web 
sites having as many if not more pages of information 
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relating to the subject of old cars, and the parts, services and 
techniques for their restoration. Accordingly, one faced with 
the problem of developing information on the subject of 
automobile restoration, would potentially be required to 

5 locate and go through literally hundreds of Web pages in an 
attempt to find those few most suited to his needs. 

In the past, the World Wide Web's approach to this 
problem has been to provide search facilities such as 
Yahoo®! and others, to assist Web users in finding the 

10 information; i.e., Web pages, they might be looking for. 
However, search facilities such as Yahoo! typically provide 
only generalized organizations of Web subject matter, those 
organizations being arranged as categories of Web pages, the 
categories and the things included in them being based on 

15 the nature of the Web sites, the subjective points of view of 
numerous staff classifiers working for the search facility, and 
the classification criteria they established. In accordance 
with this approach, organization of the information is, 
therefore, influenced by the respective points of view of the 

2Q various classifiers, the providers of the search facilities, and 
the Web site providers. As a result, such Web subject matter 
organizations tend to be subjective and suffer from over 
inclusion and under inclusion of information, which, in turn, 
affects their relevance, accuracy and ease of use. 

25 Moreover, and of yet greater concern, is the fact that 
formulating and maintaining organizations of Web subject 
matter in the fashion noted requires expenditure of substan- 
tial amounts of human time and effort and, accordingly, 
money. Particularly, continuous growth and change in Web 

30 makeup requires such organizations of Web information to 
be repeatedly supplemented and the existing framework 
revised to accommodate the introduction of new and chang- 
ing information. Accordingly, such approaches are man- 
power intensive, leading to higher costs for creation and 

35 maintenance, and because of the extensive human 
involvement, are, as well, subject to error. 

Still further, such search facilities, typically, are unable to 
group the information elements they return; e.g., Web pages, 
by their respective "relevance", that is, the degree to which 

40 others have referred to; i.e., pointed to, the respective 
elements; e.g., pages, as sources of information on the 
subject matter in question. Pages that have many references 
pointing to them are termed herein "authorities". In this 
scheme, and in the context of Web pages, "relevance" is a 

45 function of the number and quality of links to an authority 
page from various hub pages, referred to as the "authority 
weight" for the respective authority page, or, the number and 
quality of links from a hub page to various authority pages, 
referred to as the "hub weight" for the respective hub page. 

50 Moreover, and as will be appreciated, pages of higher 
relevance; i.e. higher authority weight or higher hub weight, 
are "preferred" where one is seeking information concerning 
particular subject matter. Accordingly, "preferred" informa- 
tion elements; e.g., Web pages, are considered to have higher 

55 relevance to some specific subject matter where the infor- 
mation elements; e.g., Web pages, have either, higher 
authority weight, or, higher hub weight with respect to the 
particular subject matter. And, as will also be appreciated, 
since information elements; e.g., Web pages, may both point 

60 to authority pages; i.e. function as a hub, and also be pointed 
to as an authority; i.e., function as an authority, such pages 
may be relevant either as a hub page or as authority page, or 
as both. 

No prior references has proposed systems or methods for 
65 enabling a user to interactively create an information data- 
base of "preferred" data elements such as "preferred" Web 
pages; i.e., pages of either higher authority weight, or hub 
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weight; i.e. "relevance", or, procedures for removing spun- resource compilations, typically refer to; i.e., are linked to, 

ous factors that arise during computation of bub and author- a number of other pages, and accordingly appear as if they 

ity weights for the respective pages. are "good hubs," even though many of the associated links 

With regard to relevance; i.e. weight, computation, work- P° int t0 J 5 ' 8 *? of ^mhtti subject matter, which in turn 

ers in the field have found that the computational accuracy 5 ^ s th f K }™ x from the . sam f P a §« 

is adversely affected by such factors as "seif-promotion", fake ""Antes' > once ■gun, adversely affects the 

i . j »» «u u j j .» « J accuracy of relevance computation, 

related-page promotion, , hub redundancy , copied , , .. . . . . , ... 

pages", and "false authority." Particularly, it has been found ln id * Uon > J 10 ' on ^ have previously proposed methods 

that during relevance computations pages with links to other ^cerning lmks and computation of hub and authority 

pages of the same Web site can improperly confer authority 10 W V&* ™* to suggest or disclose interactive creatioo of 

upon themselves, thus giving rise to false promotion; i.e., mformauon databases for preferred-authority data elements 

"self-promotion," and adversely affecting relevance compu- s " ch 35 Web or ; pwdures *» removing spurious 

tation accuracy. Further, it has been found that in addition to fac f ors that durm 8 "mputatum of the relevance 

seif-promotion, related pages from the same Web site, as for but filrthet > TO«°Khes have failed to appre- 

example, a home page and several sub-pages of the home « ™ le ,h f jmpoiunce and benefit derived torn including 

page can improperly accumulate authority weights, giving exm P le P a Sf . which ma y be mto . «"» .«">?"■ 

rise to false promotion in the form of "related-page tauon so as to dnve computaUon m a desired direction; i.e 

promotion", which again adversely affects relevance com- ! denUf y PfSf 5 c °°* dered relevant to the subject matter of 

nutation accuracy interest. Likewise, prior methods concerning hub and 

_ t -i. i , r i . . 20 authority weight computations have also failed to consider 

Further still, workers have found that a page may have ^ exclusion ^ computation 0 f pages found not 

value only because of the hub links it contains; that is its desirable) such non-desirable pages serving to bias the 

content may be otherwise irrelevant. In that case, if the hub computation in unwanted directions; i.e., identify pages 

hnks for such a page can be found in other pages, the hub oao ^ aata irrelevant to the subject matter of interest, 

links of such a page are redundant and may not be suitable „ r# . , , , . . 

c . , t. ■ * l. , j ,i . /v 4t * , c , , 25 With respect to previously proposed methods concerning 

for inclusion. It is to be noted that often, the value of a hub , t . r f . .l • . * ¥ T ^t • «_ r 

it _ . A j * ,i 4 A computation of hub and authonty weights, J. Kleinbere, for 

page resides m the hnks that it possesses, and not the content r . , . TT „ . . ' , MW \. _, 

r f ~, a i * i i n*i_ i*i r l l example, in his U.S. patent application entitled: Method 

of the page. Accordingly, where all the links of a hub page , t. \ f T . r ._ . *\ . . . _ , 

l r i • (( , 4 , I , , , , . • and System for Identifymg Authoritative Information 

can be found in better hub pages; i.e., hub pages having n J _ . ^ ^ * * i_ ^ T * . 

, c i . .« * j j i 4 f to A A % Resources in an Environment with Content -based Links 

greater numbers ot relevant links, and where the content of „ t T n „ c KT no/01 ,,-. n , 

7l l i_ ,l * r- * * ■ t r *i- £ * 30 Between Information Resources , Ser. No. 08/813,749, filed 

me hub page is otherwise not of interest, inclusion of the first XJf _ infV7 , TT c n , ' ^n-,™*. J • j 

r . . , „. , , , „ ,. , , it _ Mar. 7. 1997 and now U.S. Pat. No. 6,112,202 and assigned 

hub page gives rise to hub redundancy which reduces the t 4l _ . , ' / * , 

rc JT- r *l * to the assignee of the current application, describes a method 

effectiveness of the computation. e t n . r f. ' , 17 . 

r lor automatically identifying the most authoritative Web 

Continuing, spurious results have also been found to be pages from a large ^ of hyperiioked Web pages. More 

mtroduced into relevance computations by the now common 35 specifically, Kleinberg explains his method applies to the 

practice of Web site providers including in their sites mate- cases where; for example> one has a page whose content is 

rial copied from other Web sites. Because of the economic of mter est, and desires to find other pages which are authori- 

and creative pressures on Web site providers to produce utive ^ respect t0 the content of , he page of inteKsL 

"content", providers often copy page or page parts from However, while Kleinberg notes his method includes: steps 

others rather than generate new and original material for ^ f or conducting a search based upon a query composed from 

their sites. Though this approach may violate rights of the ^ contem of the page of steps fofj mere after, 

originator m the work, since little effort or cost is required, expanding the group of pages initially retrieved with pages 

Web site providers find this a particularly fast and conve- that are t0 ^ pages retrieved . and 

ment way of generating site content, and are especially stcps for i terat ively computing the relevance of the pages 

inclined to take this approach where the subject matter 45 rctrieved bascd upon me « we i ghts " for the respective page 

copied has become popular. Unk structures? his method fails to consider the interactive 

Regrettably, however, existence of multiple copies of hub creation by a user of a database structure for the information, 

and/or authority pages adversely affect relevance computa- or optimization of the relevance computation by removal of 

tions. For example, multiple copies of hub pages errone- spurious factors which adversely effect accuracy. Still 

ously increase the authority weight of pages pointed to, the 50 further, Kleinberg fails to consider inclusion and/or 

same material being pointed to each time a hub is copied. exclusion, respectively, of desirable and undesirable infor- 

Likewise, multiple copies of authority pages also produce matioo elements to influence the results of computation, 

problems. Particularly, copies of the same authority page Likewise, S. Chakrabarti et al. in their pending U.S. patent 

split; i.e. divide, the number of links pointing to the same application entitled, "Method and System for Filtering of 

subject matter; i.e., the hubs links pointing to the authority ss Information Entities", Ser. No. 08/947,221 filed Oct. 8, 

subject matter are dispersed over the copies. As will be 1997, a is 0 assigned to the assignee of the current 

appreciated, if there was only one copy of the authority, all application, describes a method for determining the "afEn- 

hubs links for the authority would point to that one copy, i ly » 0 f information elements, the method including steps for 

thereby, consolidating the effect of the links. However, if the fi rst obtaining an initial set of information elements, 

hub links rather point to different ones of the multiple ^ thereafter, steps for expanding the initial set with "related" 

authority copies, the total number of links that would information elements, and subsequently, iteratively comput- 

otherwise be available is dissipated over the multiple copies. mg the relative affinity for the respective information ele- 

Accordingly, and as is apparent, the occurrence of "copied ments. However, as in the case of Kleinberg, Chakrabarti et. 

pages" adversely affects accuracy of the relevance compu- a], fail t0 consider or describe facilities for enabling a user 

tohoto- 65 to interactive create a database structure for the information, 

And, still further, it has been found that certain pages or optimization of the "affinity" computation by removing 

pertaining to a number of unrelated topics; e.g., pages of spurious factors which adversely effect accuracy. Yet further, 
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Chakrabarti et aL, like Kleinberg, fail to disclose or suggest search, and subsequent computation and filtering of page 

procedures for aiding computation by the inclusion of steps relevance undertaken by iteration. 

for introducing example information elements; e.g., example More specifically, in accordance with the method, the 

Web pages, into the process in order to direct the compu- information elements are defined as, one or more statements 

tation in a desired direction, or excluding undesired infor- 5 of authority that form a unit of reference, such as part, or all 

mation elements; e.g., undesired Web pages, from the pro- of a Web page, or a number of Web pages in combination, 

cess in order to avoid the computation being taken in that are found to have relevance to subject matter of interest 

undesired directions. determined by improved, automated compulation of 

weights for link between information elements; e.g., weights 

SUMMARY OF THE INVENTION io ^ or byp CI lmk between Web pages. Additionally, the method 

features procedures for filtering the information elements to 

Accordingly, it is an object of the present invention to diminish spurious effects which adversely affect computa- 

provide a method for identifying, ranking and cataloging uon Q f relevance. Still further, the invention in preferred 

information. form includes steps for introducing into the process example 

Additionally, it is an object of the present invention to 15 information elements; e.g., example Web pages, found to be 

provide a method for interactively creating and or modifying desirable so as to bias the computations in a desired 

an information database including preferred information direction, and steps for excluding undesired information 

elements such as preferred, World Wide Web pages, con- elements; e.g., Web pages, so as to suppress biasing of the 

sidered in whole, in part, or in combination. computation in unwanted directions. 

Further, it is an object of the present invention to provide 20 In tne interests of simplicity, and to assist understanding, 

a method for improving the determination of relevance in the following discussion and throughout the specification, 

amongst related information elements such as hyperlinked, usage of the more specific terms "page(s)" and "Web site(s)" 

Web pages, considered in whole, in part or in combination. will be employed to exemplify, and should be understood to 

Yet further, it is an object of the present invention to embrace, respectively, the more general terms "information 
provide a method for improving the determination of rel- 25 element(s)" and "information source(s)" unless otherwise 
evance amongst related information elements such as Web expressly stated. Further, and as noted, an information 
pages, considered in whole, in part, or in combination, by the element will be considered as including one or more state- 
filtering to reduce the effects of spurious factors which ments of ™&°nty, as for example, one or more Web page 
adversely effect accuracy. hyperlinks, contained in a Web page, part of a page, or a 

o.-ii c .t_ • I.- * c 1L * • . 30 number of Web pages, which form a unit of reference. 

Stdl further, it is an object of the present invention to JU , , , r r 

provide a method for enabling a user to interactively develop t ™ ™ nd > " 15 to be noted ^ m P refe ^ d 

a personalized database structure for information organized form > ^ me ? od of P'f ent * ventI0n 15 ™plemented in 

in accordance with the user preferences, which may be software suitable to be run on a conventional 

subsequently populated with preferred information elements P er f ^ c °"P"* r ha ™8 a cenlral P rocessul g ™«> associ " 

such ashyperlinked, World Wide Web pages collected by the * ated ^ d * k stora 8 e memor y> and ««omp»- 

user nying input-output devices, such as keyboard, pointing 

, . , . . device, display monitor and printer. In preferred form the 

Yet further, it is also an object of the present invention to includes program steps for facilitating generation of 

provide a method for improving the determination of rel- a & { for ex k me mer monitor> ^ ^ u 

evance amongst related information elements such as Web ^ featurf ^ interface for enabUn a usef tQ interactivel 

pages by mtxoducing example information elements -such as Qse md QI modif M adjusla5le) frame . b ased, hierar- 

example Web pages into the process to direct the determi- chka , organiza tional structure representing an arrangement 
nation in a desired direcUon. .of topics of the user's design. In accordance with the 

As well, it is an object of the present invention to provide invention, the user formulates the frame-based organization 

a method for improving the determination of relevance 45 structure to receive information elements, such as Web 

amongst related information elements such as Web pages by pages, in whole, in part or in combination, which may be 

excluding undesired information elements such as undesired subsequently automatically collected with the method 

Web pages from the process to avoid the determination employing further input from the user to populate the 

being taken in undesired directions. various frames of the organizational structure based on the 

Yet additionally, it is also an object of the present inven- 50 respective frame attributes, which attributes may include 

tion to provide a method for enabling users to interactively classification designations, example pages, stop pages and/ 

develop databases of preferred information elements, which or control parameters used by conventional search engines, 

databases may be subsequently searched conveniently and as required. 

efficiently to identify information elements such as World ] n preferred form, the interface includes one or more 

Wide Web pages, in whole, in part or in combination, having 55 screens respectively having multiple partitions for present - 

relevance to subject matter of interest. ing; a graphical representation of the frame-based, hierar- 

Briefly, to achieve at least one of the above and other chical information structure of the users creation; the Web 

objects and advantages, the method of the present invention pages contained in the category frames of the structure, and 

features steps for enabling a user to interactively create the components employed in selecting the Web pages for 

and/or modify an information database having a 60 populating the frames. More particularly, the interface fea- 

hierarchical, frame-based organizational structure of the tures a graphical presentation of the frame-based hierarchi- 

user's selection, the frames of the structure for receiving cal information structure, together with associated tools for 

automatically retrieved, preferred information elements, freely navigating and modifying the structure; as for 

such as World Wide Web pages, taken in whole, in part, or example, by adding, deleting or moving frames within the 

in combination, the pages being preferred based on rel- 65 structure to represent the tastes and preferences of the user, 

evance to respective frames, the preferred pages being Additionally, the interface includes partitions for displaying 

identified by information queries submitted by the user for the Web pages associated with a user-selected frame of the 
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organizational structure, together with tools for manipulat- example pages and/or stop pages were specified, 

ing and managing the pages included at the frame. And, still Particularly, in the case where example hubs were specified, 

further in preferred form, the interface includes partitions preferably, any page pointed to by an example hub is used 

and associated tools for enabling the user to view respective to supplement the initial set; i.e., brought into the initial set. 

Web page content, such as pages and page links, associated 5 Further, in the case where example authority pages were 

with selected frames, and the frame attributes. specified, the initial set is preferably supplemented by 

Based on this interface presentation, the user may create including any page that points to at least any two example 

search queries for identifying pages which following itera- aut hority pages. Additionally, to the extent that stop pages 

live processing may be employed to populate the frames of havc been ^, ccificd io mc qucry> such stop pages arc 

the organizational structure. In this regard, classification 1Q climinatcd from mc mitial xL Fmthcr> oncc mc initial sct ^ 

designations example pages, stop pages and control param- supplcmcntcd ^ described, the supplemented initial set is 

eters may be selectively and alternatively combined as m £ C xpandedby^^ 

required to form query terms employed in the iterative . v . /, . . 6l \ & . J 4 . 4 v 6 . iL 

identification process toe supplemented initial set; i.e., pages that are either 

, : \ . . , , , , pointed to by pages of the supplemented initial set, or pages 

Ako in th* regard, it is to be understood that frame J5 ^ ^ tQ of ^ lcmented initial set> which as 

attributes may function as contributors to query terms, and wi]1 fce appreciated> would mcllldc spccified exam lc hub 

that vanous query terms may be used for multiple purposes. and ;fied £ , e mAo[ . Hna]1 ^ 

For example frame attributes may contribute query terms dfied WQuld ^ bfi climinated from the 

appropriate for use m generating an initial set of Web pages 6jq)aildcd> supplemeiltcd MM xt; i. c . ( root xK t0 cover the 

for consideration and additionally be employed for deter- m possibility of stop pages navillg been drawn m the 

mining link weights during computation. More specifically, expansion process, 

while frame attributes may define the subject matter catego- ... , . ...... 

ries of the organizational structure; i.e., function as classi- . ln ^is regard, the method thus includes steps for gener- 

fication designators, and, therefore, be suitable for initially » to S an initial set of pages based upon frame attributes as 

retrieving pages relevant to those categories, the frame „ described,^ 

attributes as query terms may also be used to increase the 25 q^nes and following links ; into and out of already fetched 
weight afforded a link by virtue of the query term falling P a S es > the l | eraUon * cai ™ d out ™f as described the initial 
within a predetermined "window*' of text from the link, set 15 su PP^ented and expanded to form the "root set* 
thereby, suggesting heightened relevance for the link by U P 0D whlch later copulation can be performed, 
virtue of its proximity to the query term as will be more fully „ Following creation of the root set, the method includes 
described in connection with the detailed description of the steps for associating a hub-weight parameter and authority- 
preferred embodiment hereafter. weight parameter with each Web page, and iteratively cal- 
Further, frame attributes as query terms may also include, culating the relevance for the pages of the root set based on 
and, indeed, exclusively include identification of example mc resulting, respective, hub-weight and authority-weight 
hub pages and authority pages, the identities of which may 35 values for eacn P a S e - 

be made part of a query to bias the relevance computation in In accordance with the method, the hub weights and 

desired directions. Additionally, and as noted, query terms authority weights of the respective pages are based on 

may also include stop pages, i.e., identification of pages for summations of respective authority weights and hub weights 

avoidance which have been found to bias the relevance for the links of the pages. In this regard, and, as will be 

computation in undesired directions, as well as control ^ described hereafter, weights for respective links may be 

parameters helpful for managing the extent and amount of increased to reflect the significance of the link. In accor- 

CPU, memory and storage resources used during searching, dance with the method, the calculation produces a distribu- 

as are well known in the art. tion of scores that represent the degree of relevance for the 

Also in preferred form, computation of Web page rel- respective pages, which scores are, thereafter, ordered by 

evance is undertaken by defining a Web page and its 45 numerical value to establish rankings of the pages, 

associated links, as embracing a hub page, and/or an author- Specifically, the computation produces hub and authority 

ity page, wherein a hub page, "points to"; i.e., links to, one weights for all pages, and then returns both a predetermined 

or more authorities pages, and an authority page, is "pointed portion of the highest-ranking hub pages and highest- 

to"; i.e. linked to, by one or more hub pages. In this regard, ranking authority pages. 

and as noted, usage of the term "Web page" applies to part 50 In accordance with the invention, the method additionally 

of a page, a whole page, and a combination of pages which features steps for improving computational accuracy of the 

may, respectively, constitute one or more statements of relevance for the Web pages. Specifically, the method fea- 

authority that form a unit of reference. tures steps executed during the computation of relevance for 

Continuing, the method includes steps for constructing a filtering spurious computational factors such as "self- 

"root set" of Web pages likely to be relevant to a topic 55 promotion", "related-page promotion", "hub redundancy", 

selected by the user. The root set is developed by first "copied pages" and "false authority." In preferred form, the 

generating an initial set of Web pages with the use of a method includes steps for filtering "self-promotion" from 

conventional query derived from the local and inherited the computation, the steps including the discarding of objec- 

attributes of the category frame for the database hierarchical tionable finks between pages, from the same Web Site, 

organizational structure the user is interested in populating, eo Further the method includes steps for filtering "related-page 

the query so derived, thereafter, being first applied in con- promotion" from the computation, which steps include 

ventional fashion against the World Wide Web. As "re-packing" the Web pages, for any Web site, having 

described, frame attributes may selectively include frame multiple pages showing non-zero authority, during which 

classification designations, example pages, stop pages, and/ re-packing, all authorities other that the largest authority 

or control parameters, as required. es being set to zero. 

Following return of the initial set of pages responsive to Still further, the method in preferred form also includes 

the query, the initial set is supplemented based on whether steps for filtering "hub redundancy", the steps including 
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identifying the highest weight; Le., "best," hub during detailed description when read with reference to the accom- 

computation, zeroing the authority values of all pages panying drawings in which: 

pointed to by that hub, rc-computing hub values; and piq. 1 is a diagram illustrating an Internet, environment 

thereafter, outputting the next best hub, zeroing authority mc i U( \i ng a numb er of World Wide Web site and associated 

values of pages it points to, and so forth. 5 ^ having page information suitable for being main- 

Regarding "copied pages", the method in preferred form ^ a f ram e-based, hierarchical database created or 

also features steps for diminishing the adverse effect on maim ained in accordance with the method of the present 

relevance computation caused by copied pages. Specifically, invention* 

the method features steps prior to computation of relevance * /A „ x . .„ . 

for determining whether two or more pages can be consid- 10 FIGS - a ^f 1 ^ a 

ered copies of one another by means of a "similarity" organization of information suitable for being maintained in 

checking procedure, canceling all but one of the pages, the a frame-based, hierarchical database created or maintained 

retained page being deemed the original, redirecting the m accordance with the method of the present invention; 
links to the copies found to the page deemed the original, FIGS. 3(A-B) is a diagram illustrating a hierarchical 

and increasing the weight of the links from the page deemed organization of information suitable for being maintained in 

the original by adding a factor representing the significance 15 a frame-based, hierarchical database in which a new infor- 

of the multiple copies of the original page having been mation category frame has been suggested for addition in 

made. Particularly, in preferred form, the factor used to accordance with the method of the present invention; 
mcreaselinkweigM^ FIGS 4(A _ B) is a aagram illustrating a hierarchical 

to toe log of me number of copies found of the page. orga nization of information^ suitable for being maintained in 

And, yet additionally the method in preferred torm lea- & frame . based merarchical database 

in which a new infor- 

Cures steps for filtering false authority , the steps including: matio(J frame faas been added ^ , ated ^ Web 

allowing each link in a Web page to have its own hub value; , iL t . , * jf 4 . * & 

lU i r*i_ j *■ ^ *u in accordance with the method of the present invention; 

incrementing the authority value of the destination page with y 

the hub value of the link when authority values are calcu- FIG 5 fc a schematic illustration of the display interface 

tated; and rc-computing the hub values of the original fink 25 presented to a user for enabling creation or modification of 

with the authority value of the destination page, and a database hierarchical organizational structure in accor- 

accordingly, by a spreading function, the hub values of dance Wlth & c method of the present invention; 

neighboring links. Furthermore, the final hub value of the FIG. 6 is a schematic illustration of the display interface 

page, is made the sum of the hub values of its links. presented to a user for disclosing the page population of an 

Further, and as noted, in connection with computation of 30 information frame of a database hierarchical organizational 

page hub weight and authority weight, respective weights of structure in accordance with the method of the present 

link within a page may be increased beyond a default value invention; 

to reflect relevance. For example, first, where a query term FIG. 7 is a schematic illustration of the display interface 

appears at a distance "d" within a window "W" of terms presented to a user for disclosing the content of a page 

from the link, a factor is added to link weight which is made 35 included as a member of the page population for the infor- 

proportional to [W-d]. As will be appreciated, the physical mation frame of a database hierarchical organizational struc- 

proximity of a search term to a link implies relevance for the ture in accordance with the method of the present invention; 
link to the search term and, accordingly, the query. FIG. 8 is a diagram illustrating a root set of pages 

Additionally, and thereafter, where copied pages have been expanded from an initial set of pages returned in response to 

found, and all but one deemed the original eliminated, to 40 a query based upon the attributes of a frame proposed to be 

reflect the significance of the page having been copied, the added to a database hierarchical organizational structure in 

weight of the links for the retained page are increased, accordance with the method of the present invention; 
particularly, and as noted, by a multiplication factor equal to piG. 9 is a flow diagram illustrating the general steps of 

the log of the number of copies applied to link weight. the method in accordance with the present invention; 
Subsequently, and still further, where example pages are 45 HQ ^ fa a fiow ^ morc spcdfic 

used, because of the importance of respective examp e stcps ^ thc « Dcvc l 0 p Classification Frame 

pages, the weight of their respective links within an example Hierarchy" general step of the method in accordance with 

page are likewise increased. More specifically, the weights the Qt mvention illustrated in FIG. 9; 

of all links within example hub pages are increased by a . „ .„ t 

, . , i,- * . c \ j • ,l c cn FIG. 11 is a now diagram illustrating the more specific 

predetermined multiplication factor; and in the case of 50 . . 4 , ... 7~ , , T ^ w j-*. V- 

r , . 4 r lL • L r i- i steps associated with the Prompt User To Modify Frame 

example authority pages, the weight of links within an <?* .». e cir- m nr^ <n i* n ♦ *u 

«. . • j i J? . * . * Structure step of FIG. 10, FIG. 10 itself illustrating the 

authority page are increased by first identify mg a page „~ . ^ u- i_ » i . • 

j.l c i • |- • £ , / * Develop Classification Frame Hierarchy general step in 

region, and thereafter, applying a multiplication factor to the , r „. . . ... . . °„ n r 

■ ./ r ,- , X - j j- .l accordance with the mvention illustrated in FIG. 9; 

weight of any link within the region depending on the „ t 

number of example links found within a window of prede- 55 FIG. 12 is a flow diagram illustrating the more specific 

termined size located at such a subject link within the ste l* assorted with lhe "Populate Selected Frame With 

identified region Information Elements" general step of the method in accor- 

Still further, in preferred form, the method in accordance dance . lhe V™** inventi ° n iUuslrated FIG * 9; . c 
with thc invention includessteps for ranking the pages of the FIG * 13 15 a flow dia e ram M»*nUng the more specific 

root set based on relevance following computation of page 60 sic ^ associated with the "Do Key Word Search To Identi- 

hub and authority weights, and to thereafter, truncating the ^8 Initial Sct 0f Information Elements" step of FIG. 12, 

root set to a number of highest ranking pages prescribed by nG * 12 itsclf illustrating the "Populate Selected Frame With 

l 2 ie }ISCT Information Elements' general step of the method in accor- 
dance with thc present invention illustrated in FIG. 9; 
DESCRIPTION OF THE DRAWINGS 65 FIG. 14 is a flow diagram illustrating the more specific 

The above and further objects, features and advantages of steps associated with the "Expand Initial Set To Root Set" 

the invention will become apparent from the following more step of FIG. 12, FIG. 12 itself illustrating the "Populate 
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Selected Frame With Information Elements" general step of method 2 is seen to broadly include step 4 for enabling a user 

the method in accordance with the present invention illus- to develop a personalized, frame-based, hierarchical infor- 

trated in FIG. 9. mation classification structure for the database. Further, 

FIG. 15 is a flow diagram illustrating the more specific following developments of the frame-based, hierarchical, 

steps associated with the "Ranking Information Elements Of 5 information classification structure at step 4, method 2 is 

Root Set" step of FIG. 12, FIG. 12 itself illustrating the seen to include step 6 for enabling the user to select; Le., 

"Populate Selected Frame With Information Elements" gen- randomly access, the information frame from the classifi- 

eral step of the method in accordance with the present cation hierarchy he wishes to populate with information 

invention illustrated in FIG. 9; elements; e.g., Web pages. 

FIG. 16 is a flow diagram illustrating the more specific 10 Following user selection of the hierarchical classification 

steps associated with the "Generate Weights For Information to be populated, at step 6, method 2 includes step 8 for 

Elements" of FIG. 15, FIG. 15 itself illustrating the "Rank enabling the automated retrieval of information elements; 

Information Elements Of Root Set of FIG. 12, FIG. 12 itself e.g. Web pages, from the information source; e.g., the Web, 

illustrating the "Populate Selected Frame With Information for populating the selected frame. Thereafter, method 2 

Elements" general step of the method in accordance with the 15 includes step 10 for prompting the user to indicate whether 

present invention illustrated in FIG. 9. there are any other frames in the information classification 

FIG. 17 is a flow diagram illustrating the more specific hierarchy the user would like to populate with information, 

steps associated with the "Determine Information Element If the }1S&T indicates, there are additional frames of the 

Authority And Hub Scores" of FIG. 15, FIG. 15 itself classification hierarchy to be populated, method 2 returns at 

illustrating the "Rank Information Elements Of Root Set of 20 branch 12 t0 framc ste P 6 > where tne user is a S ain 

FIG. 12, FIG. 12 itself illustrating the "Populate Selected permitted to designate a frame to be populated, followed by 

Frame With Information Elements" general step of the subsequent transition to step 8 for enabling automated 

method in accordance with the present invention illustrated retrieval of information for the newly selected frame, 

in FIG. 9; and As will be appreciated, the noted sequence of frame 

FIG. 18 is a flow diagram illustrating the more specific 25 selection at step 6, automated population of the frame at step 

steps associated with the "Truncate Ranked Information 8 > and at ste P 10 ^ to whether any frames remain to 

Elements" step of FIG. 12, FIG. 12 itself illustrating the be populated with informaUon, would continue unul the user 

"Populate Selected Frame With Information Elements" gen- has considered all the frames he wishes to populate, 

eral step of the method in accordance with the present ^ Once all the frames the user wishes to populate had been 

invention illustrated in FIG. 9. exhausted, method 2 advances over program-flow branch 14 

to step 16, where the user is prompted to indicate whether 

DETAILED DESCRIPTION OF THE there are any modifications of the information classification 

PREFERRED EMBODIMENT frame hierarchy which the user would like to undertake. In 

The method of the present invention, overcomes problems 35 °f sc whcrc the ^ would ^ to makc changes to the 
found in prior approaches to organization and retrieval of classification structure; e.g., the addition, deletion or move- 
information; as for example, pages of the World Wide Web, ment of anv frames, method 2 would advance over program- 
by providing a method for identifying, filtering, ranking, and flow branch 18 back to classification developments step 4, at 
cataloging information, and, particularly, Web pages. More wnic h the user would be enabled to enter desired modifica- 
specifically, the method is preferably implemented in com- 40 U0QS to ^ information classification organizational struc- 
puter software suitable for being run on a conventional turc - 

personal computer and includes steps for enabling a user to Thereafter, and as would be appreciated by those skilled 

interactively create and or modify an information database m the art, following entry of all desired modifications to the 

featuring a hierarchical, frame-based, organizational struc- information classification structure, method 2 program flow 

ture of the user's selection for receiving information 45 would again advance through method steps 6, S and 12 to 

elements, such as World Wide Web pages, also of the user's enable population of, and modifications to the information 

selection. Further, the method features steps for enabling the classification frame structure as described above, 

identification of information elements, such as Web pages, in Finally, following information population of any modifi- 

whole, in part or in combination, which based upon rel- cations to the classification structure, the user, at step 16, 

evance as determined by improved, automated computation 50 would again be prompted to indicate whether any further 

of the link structure between information elements, are changes to the classification structure were desired. If the 

considered preferred. user then indicates that no further modifications to the 

As will be appreciated by those skilled in the art, while the information classification structure are desired, method 2 

method of this invention has application to use by individu- would advance over program -flow branch 20 to finish, 

als for creating personalized, preferred-authority; e.g., high- 55 As would be appreciated, and as noted above, the method 

authority, information databases, which may be developed of the present invention has application to a broad range of 

from information sources such as the World Wide Web, in information sources. However, for the sake of clarity and 

which the user can tailor the information organization to his simplicity, but, with no sense of limitation, the following 

tastes, the invention also has application to broad, business more detailed description of method 2 will be undertaken 

applications, not only for commercially cataloging informa- eo with reference to the World Wide Web and the information 

tion sources such as the World Wide Web, and providing pages available there. 

facilities for distilling information retrieved to the higher As noted, the World Wide Web of the Internet, referred to ' 

levels of authority, but also, to such applications as building here for simplicity as the "Web", represents a valuable and 

preferred-authority databases for use in law, medicine, engi- important information resource, including literally hundreds 

neering and other fields. 65 of millions of documents accessed by tens of millions of 

The method in accordance with the present invention is users daily. With reference to FIG. 1, as is well known, Web ■ 

shown in its general aspect in FIG. 9. As shown there, 22 includes millions of Web sites, several of which, for 
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purposes of illustration, are schematically represented as branch 88 in the case where the user has indicated be is 
Web site servers 24 to 32, it being understood that a single going to develop a new structure. Where the user indicates 
server might host one or more sites. Additionally, and as be is going to develop a new frame structure, method 2 
shown, each Web site 24 to 32 includes numerous infotma- program flow advances over branch 88 to step 90 at which 
tion pages arranged in Web applications; e.g., Web sites, 5 U» user is prompted to provide the name for the new 
Web site databases, etc., 34 to 66. Further, and as is also well classification structure. Following step 90, and the user's 
known, a user, at his personal computer 68 equipped with a submission of an identification for the new structure, method 
suitable Web browser and communications software, can 2 «° step 92 where the user is prompted to provide 
access Web 22 over bis ordinary phone line 70, the public ™ miUal stluc,urc e ' ement; a cla f framc - for 
switching network 72, and through an Internet service 10 °* new ^ructure Subsequently, method 2 program-flow 
provider 74, which itself may be connected to public switch- advances ^ m eilher ste P 86 ' ! or retnev *i ° f 8 desi g n » ted f 
ing network 72 by an ordinary telephone line 76 and to Web P«existing frame structure, or from step 92 for initiation of 
22 by one or more high-speed data lines and indicated a new frame structure, to converge .at step 94, where method 
collectively cable 78. And, with this setup and some com- 2 ^P 1 ^ 5 <*" frame structure 10 be 8 M P rocessul g with, 
puter communication protocol magic, the user can access the is Wlth re S ard t0 me ^formation structure, experience has 
literally hundreds on millions of documents available at sbown > hierarchically organized data and, particularly 
applications 34 to 66, and others like them, on Web 22. frame-based, hierarchically organized data featuring repre- 
As pointed out, however, this great mass of information f 60 '*'* 0 ? 5 of information categories as a hierarchy of frames 
presents difficulties for the user in the form of retrieval and b^mg frame attributes and attribute values, that character- 
organization problems. And, as also pointed out, method 2 of 20 ™ and disungiush the respective frames and their associa- 
the present invention provides the user with a means for Uons *° each other , P rav ! des , a representation that enable 
dealing with those problems. users to morc rcadl ly « ndcrs , ,and a ? d appreciate the infer- 
„ 7 , . , , . . mation elements and their relationships. Still further, it has 
Particularly and as noted ,n connection with the above ^ ^ found ^ mc hicrarchical organization of Mot . 

description of the broad aspects of the mvention method 2 m&)a enables a much ^ when ^fo^^,, is 

provides solutions to those problems in the form of steps for « fat tQ ^ Ktii cvcd. Particularly, when a particular cle- 

enabhng the user to mteractively create an information mcnt of information ^ m ght> identification of its category 

database having a organizational structure which the user mMon not only designates the features to be looked for, 

can interactively personalize to his ; tastes for holding the but also, immediately excludes features, and other aspects of 

information he retrieves and steps for, thereafter, enabling me organizatioDal slnlcture not , 0 be looked for> ^ more 

automated filtering, and retrieval of reduced size; i.e., immediately ^ ciing the ^ to me relevant section of 

distilled, collections of preferred Web pages responsive to ^ organization 

queries based on the information organizational structure the A i ' . j cjc < c 

user has created Accordingly, method 2 in preferred form, supports frame- 

, " . . , „^ A , . ^ . „ based, hierarchical organizational structures for the infor- 

^desenbedm connect 3s mat ion the user seeks to catalog. HG. 2 illustrates such an 

includes step 4 for enabling the user to develop a frame- organizational structure. 

based, hierarchical information classification structure for M &hoWQ m nG 2 a £rame . based hierarchical organi- 

his personalized database. As shown in FIG. 10, step 4 of t , 1AA . . u • « * j • . 

: i ■ i i * i- . / r zation structure 100 which was previously created, is seen to 

method 2 includes a serious of more detailed steps for mch]de a M of ffames d in hierarcnical relation 

carrying out that procedure^Specifically, and with reference for ^ a al classiflcation 0 f information. As is 

to FIG. 10, classification developments step 4 is seen to „ . , . . 4 , f „ P . 

. , , , oa , . . r « 5 r . r • well understood in the art, the respective frames feature 

mclude step 80 which follows activation of the software in „ . „ . „ , ' j^„, r. • . „ f 

... \ , _ . . , . , , . attributes and attribute values tor identifying the nature of 

which method 2 is implemented and embodied at users . f , * , . t ; u °, <. 

1 * so on » i eac " frame and its relationships to the other frames, 

personal computer 68 step 80 prompting the user to iden- Particu , ar] frame attri5utes ^ d attribute vahles ma 

tify; i.e., provide, the file name and path, of the hierarchical , , % c ^ j ■ t r -j t c • j- *• * J 

, K, * A . , ^ K L ,j . 45 mclude classification descnp tors for idenUfying distinguish- 

database frame structure method 2 should initially access. . . . r c j c * 

J ing characteristics of the respective frames, and further, in 

As will be appreciated, in the case where the user has accordance with method 2, include additional parameters 

previously developed a database structure, he could call it at helpful m idcntifying pre f e rred pages; i.e. higher relevance 

this point, and continue with its use and evolution. pagcs> for populating the respective frames. More 

In the alternative, where the user wishes to develop a new 50 specifically, in preferred form, frame attributes and attribute 

structure, but, one having some similarity to the preexisting values may include example pages intended to bias iterative 

structure, he could designate the preexisting structure and identification of preferred pages for populating respective 

employ it as a basis for the new structure and database. In frames in directions deemed desirable. Additionally, frame 

such case, however, the user would be required to rename attributes may also include stop pages; i.e., pages found to 

the preexisting structure if he intended to retain it, otherwise, S5 bias iterative identification of preferred pages in directions 

in conventional fashion, the preexisting structure, as modi- deemed undesirable, which pages are to be excluded from 

fied would be saved under the original structure filename, processing. Yet further, frame attributes and attribute values 

thus corrupting the original structure. may also include control parameters known in the art which 

In the case, where no preexisting structure is available, the a search engine may use to assistant generating sets pages in 

user may simple start from, scratch; i.e., from nothing, eo accordance with method 2. More specifically, control may 

indicate a new name for the structure to be created, and include parameters helpful for managing the extent and 

proceed. amount of CPU, memory and storage resources used during 

Continuing with reference to FIG. 10, following prompt searching, as are well known in the art. 

step 80, method 2 is seen to include step 82 which, respon- Continuing, in accordance with association rules com- 

sive to the user's designation, causes program flow to 65 monly employed in hierarchical organizations, attributes 

advance either over branch 84 to retrieve at step 86 any appearing at a particular frame level in the hierarchy, will be 

preexisting structure which the user has identified, or over inherited by or may otherwise influence all depending 
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frames of lower hierarchical level. Further, within a level, shown in FIG. 5, interface 138 in preferred form is seen to 
frames may be given different attributes and or different include a first screen 140 having a partition 142 for display- 
attribute values, in the form of descriptors to, thereby, ing the hierarchical, information organizational structure 
identify different subcategory types within the category 100. Additionally, interface screen 140 is seen to have a 
level 5 second partition 144, including graphically presented tools 
The nature of frame relationships may be readily under- ** modifying structure 100. Specifically tool partition 138 
. i **l c inA a * rio ~ for screen 140 is seen to include a tool 146 for selecting 
stood with reference to structure 100. As seen in FIG. 2, frames f m Additionall tool dtion l38 ^ 

sfructure 100 features three levels of orgamzabon, 102 to ^ ^ tQ mdude ^ l4g ^ l5Q £ f ctivel 

106, the highest and most general 102 mcluding four addin ^ de(eti frames from stnlcture l00 , n 4ccor . 

frames, specifically frames 108 concerning Business HO « dance ^ me inventioil method 2> i nchldes program steps 

concerning EnterUmment U2 conce ra mg Science and for enab]in a ^ tQ ^ ftee , mQve fraffles ^ 

114 concerning News. Beneath frame 108 to 114 k a s(nlcture m ^ ^ M6 m . <dr 

second categorization level 104 which further defines first ^ „ fashion Stfl , ^ t00 , l3g fa seen to 

level 102. P^Ucularly and for ease of explanation with mchl(Je a (tzooin „ , 00 , m for enab]in ^ user , 0 Z0Qm ^ 

reference to frame 108 Business , only, structure 100 is 15 . _ „„, „„•„,- . ,,„,„,,.„ , nn ,„ „„ 

. „ .. fnd zoom out organizational structure 100 to see, 

seen to feature frames 116 Companies and 118 Finance , , c c .u u -j- •u • 

~ j" ^ f , ': . , , . respectively, fewer or more frames, thereby aiding the user s 

both of which depend from frame 102. And, beneath frames . j .i t • . inn 

.... . v „„_ . . , ' perspective m laying out and modifying structure 100. 

116.118, structure 100 is further seen to include a third . ..... „ . . , . , 

category level 106 which yet additionally defines second Yet additionally mterface screen 140 is also seen to have 

level 104 and first level 102. Particularly, wird level 106 is 20 a partition 152 including a section 154 for identifying .the 

seen to include frames 120, "Computers"; 122 "Products & " c ° ame 156 d ^ organizational structure 100. 

Services"; 124, "Savings & Securities"; and 126, "Job," Pf, rt f° n 152 * also seen to include a section 158 

frames 120 and 122 depending from frame 116, "Compa- mc J udin f dro H own m « us « con venhonal "Windows 

nies" and frames 124 and 126 from frame 118, "Finance/' ^bion for enabling management of mterface 138. In pre- 

9 c rerred form, the menus mclude elements, such as, File 

Accordingly, based on the frame structure and associated * m (<Edit » 162> « V iew" 164 and "Help" 166. Still further, 

classification descriptor frame attributes and attribute values screcn partition 152 is ^ ^ to mchlde a scction 168 

just described, it would follow that frame 122, "Products having interfacc modc 5uttons for enabling movcment 

&Services , as a 'child" of frame 116, "Companies" and between int erface mode screens. More specifically, section 

"grandchild" or frame 108, "Business';, in view of the above 168 of partition 152 is seen to include a mode button 170 

discussion concerning attribute inheritance, carry the clas- "Structure" for viewing organizational structure 100 at 

sification descriptor limitations of its progenitors. screen partition 142 a mode buttoD 172 for v i ewing the 

Specifically frame 120 would be considered to include mfonna ti 0 n element; e.g. Web page content of any frame 

product and service information of business companies, selected with tool 146 as will be more fully 

on * y " 35 described below. Finally, section 168 of screen partition 152 

In the case where a user intending to employ structure 100 is also seen to include a mode button 174 for viewing the 

for organizing his information found such limitations inap- contact of the respective information elements; e.g. Web 

propriate or undesirable, in accordance with the present pageS) populating a particular frame of structure of 100, as 

invention, he could readily undertake interactive modifica- also will be described more fully below, 

tion of structure 100. ^ Continuing with reference to FIG. 6, screen interface 138 

While at first blush, this may seem straight forward, those in preferred form is also seen to include a second screcn 176 

skilled in the database art will appreciate that in the past, is having multiple partitions. Specifically, screen 176 is seen to 

was not readily possible to modify database structure, as to include a partition 178 for displaying the information ele- 

do so would typically require reloading of the database data. ments; e.g. Web pages, which populate a particular frame of 

As is apparent, from the above discussion of hierarchical 45 organizational structure 100. In accordance with the 

frame attribute inheritance rules, if a frame in a hierarchy is invention, method 2 includes steps for presenting the pages 

changed, the limitations associated with related frames of of a frame identified as authorities at column 180, and pages 

the hierarchical structure; e.g., parent, child, related frames, identified as hubs at column 182. Further, partition 178 is 

must also change, thus potentially causing data previously also seen to include presentation of the attributes, specifi- 

held at a frame prior to a frame structure change, to no 50 cally classification descriptors, for the frame of structure 100 

longer be appropriate for the same frame after a modification being presented at partition 178 at partition region 184, and 

of the structure. the title for the respective frame at partition region 186. 

The invention, provides steps for easily and quickly As also seen in FIG. 6, interface screen 176 further 

identifying information for re-populating modified frames, includes a partition 188 for displaying frame structure 100. 

and additionally and independently by providing steps for 55 In accordance with method 2, structure 100 at partition 188 

supporting a display interface that enables the user to readily may be readily scrolled in typical "Windows-Explorer" 

add, delete, or move frames within a hierarchical informa- fashion. In this manner, control of the frame content at 

lion organizational structure. partition 178 may be readily effected by selecting frames of 

To facilitate this, method 2 of the present invention structure 100 in conventional fashion; as for example, with 

features steps for presenting a display at the monitor of 60 a mouse pointer. Also in preferred form, frames of structure 

user's computer 68 for enabling the user to interactively and 100 may again be freely managed; for example, added, 

easily modify structure 100. With reference to FIG. 5, user's deleted and moved at partition 188. 

computer 68 is seen to have a monitor 132 featuring a Continuing, screen 176 in accord with method 2 is also 

display 136 at which interface 138 in accordance with seen to include a partition 190 for enabling editing of the 

method 2 is provided. In accordance with the invention, 65 frame page content. More specifically, partition 190 is seen 

method 2 includes program steps for furnishing interface to include information entry fields, 194, 196 for respectively 

138 with one or more screens having multiple partitions. As entering modifications to the frame classification descriptor 
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attributes shown at field 192. As noted, in accordance with method, interface 138 provides displays; e.g., screens 140, 

the invention, frame attributes may include classification 176 and 202, for enabling the user to make judgments as to 

descriptors, example pages, stop pages, and/or control whether frame modification would be warranted, 

parameters, which may be selectively combined to control Particularly, at step 224 following step 220 in FIG. U, the 

the initial and subsequent queries for returning information 5 user can make a judgment as to whether frame structure 100 

elements; e.g., pages, for populating the selected frame in ^ too gcncra l 0 r not, based upon a review of the authorities 

accordance with method 2 as will be described more fully ^ hubs p rescn ted at interface screen 176 and their content 

below Particularly, entry field 194 enables the user specify at mm 202 - for cxample> wherc ^ framc cxisted prior to 

classification descriptor attributes to be mcluded in the bcin WQrkcd Qr ^ mc CQUrsc a ^ Qt popu^ 

initial query, while entry field 196 enables the user to ^ &ccn m nG u tf ^ ^ ^ framc 

expressly exclude classification descriptor attributes not 4 £ . 1 *u ji j u u^*, * 

desired because of known lack of relevance to the subject L°^ c 100 memod 2 proceeds over branch 226 to step 

frame. Additionally, partition 190 is seen to include a feaiure ™ whcrc * c ™ bled t0 L s P ht thc sclcctcd 

palette scroll box 198 having a predetermined lists of frame add > at StC ? 23 °' mo " ^ afic 

classification descriptors known to produce pages of author- attributes that would be designated at step 232. As would be 

ity for the features available from the feature palette scroll 15 appreciated, addition of a frame could be readily effected 

box. In accordance with method 2, where the user is uncer- with use of interface 138 as described above, 

tain what descriptors to include for the selected frame, he Thereafter, method 2 program flow loops back over 

can make reference to the feature pallet. Still further, parti- branch 233 to modification prompt step 222, where the user 

tion 190 includes entry controls 193, 195, 197 for enabling may again assess whether further modifications are neces- 

the user to identify and enter, respectively, example 20 sary. For example, if after specifying addition of a frame at 

authorities, example hubs and stop pages above described, step 230, the user determines the frame specified at step 232 

In preferred form, suitable example authority pages, hub is too specific, the user would advance method 2 over branch 

pages and stop pages at lists 180, 182 of partition 178 may ^ lQ 236j where the ^ couM then advance method 

be highhghted and, thereafter designated for entry at con- 2 over braflch m tQ ^ whefe the ^ couM feadil 

trolsl93 195,197,res^ ^ ^ a frame al mterface 138 ^ above described ^ 

addition to controls 193, 195, 197 at partition 190, controls n „ „„„ rtt f„ mo „, etan a ~a ~~ t L~A i 

- lL e c ((u „ « 11 1 * *u u * * u re-specity a parent rrame at step 242. And, again, method 2 

in the form of buttons well known in the art, but not shown *, . , J *_ , 4 „ r . J J*. ' - Ajt 

for purposes of simplicity, may be placed in partition 178 wouId loo P back 10 ste P 222 over melhod 2 branch 244 ' 

related to the respective authorities and hubs listed, which a Following return to step 222, the user could again deter- 

user could activate, for example, with a mouse pointer, to mine if any further modifications of structure 100 were 

identify the associated authority or hub for inclusion as an 30 called for. For example, if the user neither found the selected 

example page. frame too general nor too specific, method 2 advances over 

Finally, screen 176 is also seen to include a partition 152 branch 246 to step 248 where the user could evaluate 

identical to that of 152 of screen 140 including respectively, whether the selected frame is misplaced and required to be 

designation of the display structure filename, menus, and moved. If the user determines that the selected frame should 

mode buttons. De moved, method 2 advances over branch 250 to steps 252, 

Continuing with reference to FIG. 7, interface 138 in 254 256 where associated sub-frames could be 

preferred form is seen to include a third screen 202, again removed and replaced in structure 100 as required at steps to 

having multiple partitions. In the case of interface screen 252 > and 254 respectively, and the selected frame 

202, a partition 204 is provided for displaying the content of ^ re-specified for its new location. Thereafter, method 2 loops 

a document included at lists 180 or 182 of, respectively, back over branch 258 to ste P 222 t0 enable me ^ 10 a S ain 

authority or hub pages for a selected frame presented at assess whemer any tomer modifications to stnicture 100 are 

partition 178 of interface screen 176. As will be appreciated, called for - If ^ ™** finds that 00 further modifications to 

presentation of the content and links of an authority or hub me structure 100 are called for, method 2 exits the structure 

page enables the user to quickly and easily monitor the 45 modification sequence at branch 260. 

effectiveness of the query and search process; i.e., frame With reference to FIG. 9, following completion of struc- 

attributes, and iteratively adjust the pages returned to popu- ^rc development step 4, as noted, method 2 advances to step 

late the selected frame of structure 100. 6 where the user may select a frame he would like to 

To further assist in that process, in preferred form screen populate with information pages. Following designation by 

202 also again includes editing partition 190 and structure 50 me of thc framc hc would likc to P°P ulate » mcthod 2 

display partition 188 shown at screen 176. Still further, advances to step 8 where population of the selected frame is 

screen 202 is also seen to include partition 152 shown at undertaken. 

screen 176 and screen 140 which presents the filename for Before describing program flow for frame population with 

structure 100, drop-down menus and mode buttons. information elements; e.g. Web pages, a review of the 

Continuing with reference to FIG. 10, following display 55 underlying information elements retrieval process would be 

of thc information structure; e.g. structure 100, method 2 appropriate. 

includes step 220 for enabling the user to modify structure While methods previously known for computing rel- 

100. As better seen in FIG. 11, method 2 includes step 222 evance exploit the annotative power latent in hyperlinks, 

for prompting the user to select a framc to modify. As will method 2 of the present invention seeks to determine what 

be appreciated, step 222 would be interactively conducted so a Imk tt i" says about its destination information element; 

with the user at method interface 138. Specifically, mcthod e.g., page "j." To investigate this, method 2 defines a 

2 includes program steps for successively presenting to the numerical affinity extending from link i to page j denoted a^-. 

user interface screens at which the user can make judgments In general terms, method 2 features three steps, 

as to whether changes in structure 100 arc required or \t3^1 . Acquire a root set of entities to be analyzed, the root set 

desired. 65 being acquired by generating an initial set of Webpj tges 

For example, once the user has selected a frame of with the use of a query derived from attributes of the 

structure 100 to modify at step 220, in accordance with the category frame the user is interested in, frame 
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attributes, as noted, selectively including frame classi- 
fication designations, example pages, stop pages, and/ 
or control parameters, as required, and which may 
exclusively include example pages. Subsequently, the 
initial set is supplemented and expanded based on 5 
whether example pages and/or stop pages were speci- 
fied. Where example hubs were specified, preferably, 
any page pointed to by an example hub is used to 
supplement the initial set by including them with the 
initial set. Further, in the case where example authority 10 
pages were specified, the initial set is preferably 
supplemented by including any page that points to at 
least any two example authority pages. Additionally, to 
the extent that stop pages have been specified in the 
query, such stop pages are eliminated from the initial 15 
set. Thereafter, the supplemented initial set is expanded 
by including pages directly linked to pages of the 
supplemented initial set; i.e., pages that are either 
pointed to by pages of the supplemented initial set, or 
pages that point to pages of the supplemented initial set, 20 
which, would include specified example hub pages and 
specified example authority pages. Finally, the speci- 
fied stop pages would again be eliminated from the 
expanded, supplemented initial set; i.e., root set, to 
cover the possibility of stop pages having been drawn 25 
in during the expansion process. 

2. Approximately generate one or more eigenvectors of 
two similarity matrices, described below, by means of 
iterative updating, as also described below. 

3. Analyze the resulting eigenvectors to facilitate ranking 30 
and/or partitioning of the set of entities. 

In the case where relevant sources are to be identified, 
step 2 above described proceeds as follows. 

Let "S" be the root set and "E" be the set of links, between 
pages in the root set. Further, let m=|E|, where m refers to 35 
links i; and n=|S|, where n refers to pages j. Additionally, let 
"A" be an m x n matrix representing the weight of each link 
in connection with hub calculations, and "B" be an n x m 
matrix representing the weight of each link, in connection 
with authority calculations, and where the contents of A and 40 
B are as defined below. Still further, let a be an n vector 
representing the authority value of each of the n pages. 
Additionally, let h be an m vector representing the hub value 
of each of the m links. With the above in mind, each round 
of iteration comprises the following three steps: 45 

1. Update authority scores: a*-Bh; 

2. Update hub scores: h*-Aa; and 

3. Re-pack; Le., re-compute authority; i.e., a. 
This process is repeated for as long as necessary to so 

achieve the desired result. In preferred form, five such steps 
have been found sufficient. 

While the contents of A and B may include contribution 
from a number of factors affecting link weight, in preferred 
form, method 2 includes contributions from such factors as: 55 
textual content; self-promotion; spreading functions; 
example pages; and copied pages. 

Particularly, and with regard to the contents of matrix 
B^fbyJ, b fi is the weight of link i, which points to page j. 
Initially, b /7 is set to 1, the default hub link weight, if link i 60 
points to page j, and is 0 otherwise. Thereafter, hub link 
weight b /f is increased if the link can be considered to have 
additional relevance due to one or more of such factors as: 
being located in a page close to a search term; referred to as 
"context relevance", context relevance being additive for 65 
each occurrence of required proximity to a search term; or 
being copied multiple time; i.e., termed "replication rel- 
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evance"; or being illustrative of a preferred link; termed 
"example relevance." Further, in accordance with method 2, 
in the case where multiple relevance enhancing factors are 
present, the weight of the link is successively increased 
either additively or multiplicatively by each factor as 
described below. 

More specifically, in the case of context relevance, for 
each query term that occurs within W words of link i, b ;V ; i.e., 
link weight, is increased in value by an amount equal to 
(W-d). In this context, d is the distance from the anchor text 
of link i to the nearest occurrence of the query term, and may 
have the value 0. In preferred form, W is set to 10. Further, 
before the described context relevance value; i.e., W-d, is 
added to the link weight, it is first multiplied by the factor 
1.2 if the query term begins with a "+" sign; and by -0.2 if 
the query term begins with a sign, the "+" and signs 
being understood in conventional search fashion; and by 0 if 
the query term is separated from the link i by an HTML 
heading or horizontal line. Moreover, and as noted, such an 
addition would be included for each instance of context 
relevance. 

Subsequently, and yet further, where copied pages are 
found; i.e., where replication relevance is present, and all but 
one deemed the original is eliminated, to reflect the signifi- 
cance of the page having been copied, the weight of the links 
for the retained page, as described above, are increased by 
a factor equal to the log of the number of copies. Further, this 
replication relevance is applied as a multiplication factor to 
the link weight as enhanced by any other relevance factor. 

Still further, and thereafter, where example pages are 
used, because of the importance of respective example 
pages, the weight of the links within an example page are 
likewise increased based on example relevance thereby 
deemed present. Particularly, the weights of all links within 
example hub pages are increased by a multiplication factor 
of 1.1. In the case of example authority pages, the weight of 
links within the page are increased by first identifying a page 
region as defined by the occurrence of a page and/or section 
heading, and/or ruled page line; page and section headings 
and ruled lines being defined in conventional HTML fash- 
ion. Subsequently, within the identified region, a window of 
25 links forward and 25 links backward in the page from the 
subject link is placed about a subject link. Thereafter, if there 
is one example authority link within the window, a multi- 
plication factor of 1.1 is applied to the weight of the subject 
fink. Further, if there are two or more example authority 
finks within the window, a multiplication factor of 1.5 is 
applied to the weight of the subject link. However, if no 
example authority links are present within the window, no 
multiplication factor is applied to the weight of the subject 
link. 

With regard to matrix A, in accordance with method 2, 
matrix A is defined to be B r , wherein the B matrix is as 
previously denned, but is modified to take into account the 
spreading of authority weight around neighboring finks. 

Particularly, consider two links, i and i\ then, let s (i, 
i')=g(|i-i'[) if i, i' are neighbors, and s(i, i')=0 if i, i' are 
otherwise. Links i and i* are "neighbors" if there is no page 
boundary, HTML heading or horizontal line separating 
them. In this regard, g(n) is a truncated, Gaussian function 
well known in the art. Further, in preferred form, the 
following values are provided, g(0)=l; g(l)=0.5; g(2)=0.1; 
g(3)=0.01; and g(n>3)=0. 

Under the noted considerations, A=[a^], where a^Zby/ 
s(i, i'), the summation being taken over i' eE, such that i' 
points to j. It is to be understood i refers to a link and j refers 
to a page and a /y is the weight of link i that points to page j. 
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Finally, re -packing authority step 3 above noted for each Likewise, multiple copies of authority pages also produce 

round of iteration is performed by zeroing the authority of all problems. Particularly, copies of the same authority page 

but the highest authority page of each Web site. split; i.e. divide, the number of links pointing to the same 

After the requisite set of iterations are complete, the hub subject matter; i.e., the hubs links pointing to the authority 

scores of a page p is set to the sum over all links i on the page 5 subject matter are dispersed over the copies. As will be 

h[i]. appreciated, if there was only one copy of the authority, all 

In accordance with method 2 the basic computational hubs links for the authority would point to that one copy, 

procedure is modified in several ways in order to remove thereby, consolidating the effect of the links. However, if the 

other spurious effects that adversely affect the noted com- hub links rather point to different ones of the multiple 

putation. 10 authority copies, the total number of links that would 

Particularly, to avoid "self -promotion"; i.e., accumulation otherwise be available is dissipated over the multiple copies, 

of spurious authority conferred on pages by pages of the Accordingly, and as is apparent, the occurrence of "copied 

same Web site, in accordance with method 2, pages are pages" adversely affects accuracy of the relevance compu- 

filtered so as to discard links from pages on a Web site to tation. 

pages on the same Web site. In accordance with method 2, 15 Method 2 in preferred form also features steps for dimin- 

af&nity between entities on the same Web site is thus ishing the adverse effect on relevance computation caused 

reduced, a Web site being understood to potentially encom- by copied pages. Specifically, method 2 features steps prior 

pass either a part of, or all of a host, or several hosts. Two to computation of relevance for determining whether two or 

pages are defined as being on the same Web site if they more pages can be considered copies of one another with the 

satisfy the following test: for class A and class B IP 20 use of a "similarity" checking procedure, canceling all but 

addresses, two pages are considered to be on the same Web one of the pages, the retained page being deemed the 

site if the two most significant octets of their respective original, redirecting the links to the copies found to the page 

addresses match; for class C addresses, two pages are deemed the original, and increasing the weight of the links 

considered to be on the same Web site if the three most from the page deemed the original by using a multiplication 

significant octets of their respective addresses match; and for 25 factor applied to link weight representing the significance of 

class D addresses, two pages are considered to be on the the multiple copies of the original page having been made, 

same Web site if all four octets of their respective addresses Particularly, in preferred form, the similarity check is under- 

match. taken with the Shingles algorithm developed by DEC SRC, 

Regarding "redundant hubs," the value of a hub page is, an affiliate of the Compaq Computer Corporation. Further, 

by definition, in its links rather than its contents, i.e., "better" 30 the multiplication factor used to increase link weight for 

hubs are hubs having greater numbers of links to quality links of copied pages is made equal to the log of the number 

authority pages, quality, as noted, being assessed based on of copies found of the page. 

the authority scores found for an authority vector during Finally, with regard to "false authority", it has been found 

computation, and the hub scores for a hub vector during that many resource compilations such as bookmark files 

computation. Accordingly, if all the destinations accessible 35 contain pages pertaining to a number of disjoint topics. This 

from a particular hub are also accessible from "better" hubs; causes such compilations to falsely become good hubs, 

i.e., hubs of greater relevance, that particular hub need not which in turn cause irrelevant links from the same page to 

be outputted. More generally, the method seeks to return a falsely become good authorities. To address this problem, 

set of hub pages that together contain as many, unique, method 2 notes that pointers to pages on the same topic tend 

high-quality links as possible. The method, therefore, filters 40 to be clustered together in resource compilations. Method 2, 

the pages by applying a well-known "greedy test" as fol- therefore, filters the pages by allowing each link in a Web 

lows: once the iteration step has converged, the method page to have its own hub value so the hub value of the page 

identifies the best hub; zeros the authority values of all pages becomes a function of the particular link rather than a 

pointed to by that hub; re-computes hub values; and constant. When computing authority values, the authority of 

thereafter, continues outputting the next best hub, zeroing 45 the destination is incremented by the hub value of the link, 

authority values of pages it points to, and so forth. When computing hub values, the authority value of the 

With respect to "related-page" factors, it has been found destination is used to increment the hub value of the source 

that despite application of the "self-promotion" removal link and according to a spreading function, the hub values of 

procedures noted, it is possible, for instance, for a home neighboring links. Thus, useful regions of a large hub page 

page of a Web site, and several children of that page to 50 can be identified and the effects of irrelevant portions of the 

accumulate authority. However in the final output the page diminished. The final hub value of a page is the sum of 

method seeks to provide the user as much authoritative all the hub values of its links. 

substance as possible in as small a number of pages as Method 2, applies the filtering procedures so as to be 
possible. To achieve this, method 2 applies step 3 described consistent with the described matrix computational frame- 
above, i.e., re -compute authority; Le., a. 55 work and to enable an acceptable degree of convergence. 

Continuing, spurious results have also been found to be Particularly, the "self-promotion" and "redundant hub" fil- 

in traduced into relevance computations by the now common tering procedures are arranged, respectively, as "pre" and 

practice of Web site providers including in their sites mate- "post" processing steps, and the "false authority" filtering 

rial copied from other Web sites. Because of the economic procedures are arranged as a linear transformation that may 

and creative pressures on Web site providers to produce 60 be expressed as a matrix multiplication, 

"content", providers often copy page or page parts from Continuing, in the above discussion, "actual" links that 

others rather than generate new and original material for expressly connect a first page with a potentially relevant 

their sites. Existence of multiple copies of hub and/or second page are presumed. However, in accordance with the 

authority pages, however, adversely affects relevance com- invention, method 2 would also apply to "virtual" links; Le., 

putations. For example, multiple copies of hub pages erro- 65 links that may be inferred based on similarity between pages 

neously increase the authority weight of pages pointed to, not expressly linked. In the case of a virtual link, all that it 

the same material being pointed to each time a hub is copied. is required is to adjust the affinity indices, "a^", "by/' where 
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"j" indicates the destination page and "i" indicates the query terms and links quickly and easily yet further assists 

virtual link to represent the relationship between the pages the user's evaluation of the selected frame's page popula- 

considered virtually linked. Once that has been done, the tion. Particularly based on such presentation, the user is able 

computation proceeds as described for actual links. to identify authority pages and hub pages suitable for use as 

For example, in accordance with the invention, it has been 5 example pages in the fashion described above to bias 

found that pages can be virtually linked based on the number iteration of page identification in a desired direction 

of commonly occurring terms between them, relative to the Likewise, in a comparable fashion the user may also 

number of terms in the reference page. Specifically, a virtual * dcnU ?y P a 8 c L s Ending to bias identification m undcsired 

link i may be inferred where a page j which includes n terms ^"^jf thereafter * St0P Pag ° S * 

common to a page j* having t total terms. For this virtual link, 10 su PP ress suc e ec . 

1 j*- • • j 7, , A i , . r Though for sake of simplicity not shown in FIG. 6. 
the affinity index a* would be set equal to the number of method % m ferred form ^ ^ summary infor- 
terms m page j that are common to the terms in page j m ^ QQ total of the popu i a u orj , associated 
divided by the total number of terms in page j*, the virtual weight rangcs md othcr scarch . result information to aid the 
link having the direction of the reference page j' to page j ^ cvaluat i 00 at interface screen partition 178. As will be 
having the common terms. As will be appreciated, in addi- 15 ap p re ciated by those skilled in the art, such information may 
tion to the functional relationship for a^- described, other oe readily obtained from the computation results described 
linguistic approaches applying natural language processing and presented in conventional fashion at partition 178. 
techniques could also be used to provide understanding of Continuing, though the description of step 260 shown in 
the document content and to, thereby, develop virtual links. FIG. 12 has been given for the case where the selected frame 

Returning to the description of program flow for method 20 includes a pre-existing page population, it should be 

2 and, particularly, population of a selected frame of the understood, comparable steps would apply in the case where 
information organization structure with information ele- the frame had been newly designated, and no prior search 
ments; e.g., Web pages, attention is directed to FIG. 12, in conducted. In case of a newly designated, unsearched frame, 
which frame population step 8 of the general method no page population, of course, would yet be available for the 
description presented in FIG. 9, is shown in greater detail. 25 user to review. However, as will be described below, other 
Once again, and as noted above, for simplicity, the more components of interface 138, such as display of the title for 
specific term "Web page(s)" or "page(s)" will be used in the the selected frame at partition region 186 and the frame 
following discussion, it being understood, however, that the attributes at region 184, along with editing partition 190 and 
more general term "information element(s)" is to be under- organizational structure partition 188 would be available to 
stood. Further and as noted above, the reference to Web 30 the user. In this regard, in accordance with method 2, it 
pages includes a part of page, a whole page or combination should be understood that example pages could exclusively 
of pages but functions as the unit of reference. As seen in be entered as query terms. 

FIG. 12, following selection of a frame to be populated at Continuing with reference to FIG. 12, once the user has 

interface 138, method 2, at step 260 calls for the page viewed and fully analyzed the page population for the 

population of the selected frame to be viewed by the user to 35 selected frame at step 260, program flow advances to step 

assess whether the population is acceptable, or whether 262 at which method 2 enables the user to indicate whether 

further searching and populating is desired. In accordance the population is acceptable. In the case where the popula- 

with method 2, and in preferred form, the user undertakes tion is acceptable, and the user so indicates, program flow 

viewing of the selected frame page population at partition advances over branch 266 and exits step 8, as best seen in 

178 of interface screen 176 described above and shown in 40 FIG. 9, to proceed to step 10 where method 2 enables the 

FIG. 6. user to select another frame and associated population for 



As seen in FIG. 6, specifically, at interface partition 178, 
in the case where the frame selected had been previously 
populated; e.g., the frame of a pre-existing information 
organizational structure, or a frame that had been previously 
populated, as where a search had been previously conducted, 
the user is presented with authority list 180 and hu b list 182 , 



preview. 

On the other hand, and with reference again to FIG. 12, 
in the case where the user at step 262 finds the page 
45 population for the selected frame to be unacceptable, and so 
indicates, method 2 program flow advances over branch 264 
to step 268 where the user can modify the selected frame's 
attributes in order to generate a new search query and 



including the, respective, authority and hub pages for the 

frame collected with the use of prior searches and associated! retrieve a new collection of pages. In accordance with 

queries. More specifically, in preferred form, method 2J 50 method 2, to facilitate correlation of the selected frame page 

includes steps for presenting at list 180 and 182, the titles on population with the frame description, method 2 includes 

the pages previously collected ranked by authority weight! steps for enabling the attributes of the frame to be employed 

and hub weight, respectively. as the query terms for the search. 

As will be appreciated, this presentation of collected In this regard, and as noted above, frame attributes may 

pages ranked by respective authority and hub weights 55 also include, and, indeed, exclusively include identification 

enables the user to quickly assess whether the population of of example hub pages and authority pages, the identities of 

the frame is acceptable. Further, this form of presentation is which may be made part of a query to bias the relevance 

rendered yet more effective in aiding the user's evaluation computation in desired directions. Additionally, and as 

when combined with the ability of method 2 to provide noted, query terms may also include stop pages, i.e., iden- 

distilled lists of only the highest weight authority and hub 60 tification of pages for avoidance which found to bias the 

pages, respectively. relevance computation in un desired directions, as well as 

And, as described above, to additionally aid the user's parameters helpful for retrieving the initial set, such as 

evaluation, method 2 includes program steps which enable control parameters helpful for managing the extent and 

the user to randomly select any authority page or hub page, amount of CPU, memory and storage resources used during 

and view its content and links at screen partition 204 shown 65 searching, as are well known in the art. 

in FIG. 7. As is apparent, the ability of method 2 in preferred Accordingly, in the case where a pre-existing population 

form to enable the user to view and analyze page content, is available at the selected frame, all the user need do to 
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adjust the frame page population, is to adjust the frame 
descriptors, Le., frame attributes, to, thereby, generate new 
query terms which, in turn, will be employed by method 2 
to automatically retrieve a new set of pages to populate the 
frame. In the case where the frame is newly designated, and 5 
no acceptable population yet exists, the user would employ 
the descriptors; i.e., attributes, for the frame to enable 
method 2 to automatically retrieve a beginning population. 

In preferred form, method 2 includes steps for permitting 
the user to easily and conveniently adjust the attributes for 10 
the selected frame. Particularly, method 2 includes steps for 
enabling display interface 138 to present the attributes for 
the selected frame so they can be readily adjusted. As shown 
in FIGS. 6 and 7, screens 176 and 202 include editing 
partition 190 having the same elements, 192 to 198, for 15 
enabling the user to conveniently modify frame attributes 
and accordingly, the search query terms. Specifically, editing 
partition 190 at screens 176 and 202, include display field 
192 for presenting the current form of the frame attributes, 
and entry field 194 for enabling the user to add frame 20 
attributes; i.e., descriptors, to the current form of the frame 
attributes to further refine the search query. Additionally, 
editing partition 190 is seen to include entry field 196 for 
enabling the user to expressly excluded attributes which the 
user believes would not be helpful; i.e., not relevant. 25 

As well, editing partition 190 is also seen to include 
feature palette 198 in the form of a pull-down menu, which, 
as described above includes a listing of predefined frame 
attribute features. More specifically, in preferred form, 
method 2 includes steps for associating the feature palette 30 
menu items with look-up tables or libraries of frame 
attributes; i.e., search query terms, known to produce pages 
of high quality; i.e., authority, for the respective attributes 
listed in the menu. 

Still further, and as seen in FIGS. 6, 7, method 2 in 35 
preferred form also includes steps for enabling attribute 
editing partition 190 to also include entry field 191 for 
permitting the user to specify the number of pages to be 
retrieved. For example, if the user wishes to retrieve only 
five or six of the highest authority pages found in a search, 40 
or for that matter, only the highest authority page, he can do 
so with an appropriate entry at field 191. As well, method 2 
also includes steps for enabling partition 190 to include 
controls 193, 195, 197, for permitting the user to specify, 
respectively, example authority and hub pages and at field 45 
197 to specify specific pages to be excluded as described 
above. In this regard, and as noted, control button means, not 
shown, located at interface partition 178 associated with the 
lists of authorities and hubs could additionally or alterna- 
tively be used as well. 50 

Continuing with reference to FIG. 12, following step 268, 
program flow advances to step 270, at which method 2 
includes steps for automatically composing a query for 
initiating a search based on the frame attributes identified by 
the user, frame attributes as noted potentially including 55 
classification descriptors, example pages, stop pages and/or 
control parameters as selectively combined by the user, and 
retrieving an initial set of information pages. More 
specifically, and as shown in FIG. 13, method general step 
270, in preferred form, first includes the more specific step 60 
280 of forming a search query based upon the frame 
attributes the user has entered at editing partition 190, either 
at screen 176 or 202. Thereafter, method 2 includes step 282, 
for parsing the query generated at step 280 to produce, in 
conventional fashion well known in the art, a series of search 65 
terms, and to, subsequently, at step 284 undertake a search 
of the World Wide Web based on the parsed query, again in 
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conventional fashion; e.g., using query syntax, common to 
Web search engines. Finally, following the search, method 2 
returns an initial set of information pages at step 286. 

Continuing with reference to FIG. 12, once method 2 
identifies and returns the initial set of pages based on the 
frame attributes which the user entered at step 270, program 
flow advances to step 272, at which method 2 includes steps 
for automatically expanding the initial set to form a "root 
set" of pages. To accomplish this, method 2 includes a 
sequence of more specific procedures better seen in FIG. 14. 

Particularly, following return of the initial set of pages 
responsive to the query, the initial set is supplemented based 
on whether example pages and/or stop pages were specified. 
Particularly, in the case where example hubs were specified, 
preferably, any page pointed to by an example hub is used 
to supplement the initial set; i.e., brought into the initial set. 
Further, in the case where example authority pages were 
specified, the initial set is preferably supplemented by 
including any page that points to at least any two example 
authority pages. Additionally, to the extent that stop pages 
have been specified in the query, such stop pages are 
eliminated from the initial set. Further, once the initial set is 
supplemented as described, the supplemented initial set is 
then expanded by including pages directly finked to pages of 
the supplemented initial set; i.e., pages that are either 
pointed to by pages of the supplemented initial set, or pages 
that point to pages of the supplemented initial set, which, as 
will be appreciated, would include specified example hub 
pages and specified example authority pages. Finally, the 
specified stop pages would again be eliminated from the 
expanded, supplemented initial set; i.e., root set, to cover the 
possibility of stop pages having been drawn in during the 
expansion process. 

In this regard, the method thus includes steps for gener- 
ating an initial set of pages based upon frame attributes as 
described, and then through an iterative process of issuing 
queries and following links into and out of already fetched 
pages, the iteration is carried out until as described the initial 
set is supplemented and expanded to form the "root set" 
upon which later computation can be performed. 

As seen in FIG. 14, general expansion step 272 shown in 
FIG. 12, first includes step 285 for controlling program flow 
depending upon whether or not example hubs were specified 
in the query giving rise to the initial set. In the case where 
no example hubs were specified, method 2 proceeds to step 
289. However, in the case where example hubs were 
specified, the initial set is supplemented with pages pointed 
to by the example hubs. Thereafter, method 2 progresses to 
step 289. At step 289 method 2 determines whether or not 
example authorities were specified. If no example authori- 
ties were specified, program flow continues to step 293. 
However in the case where example authorities were speci- 
fied in the query, where pages are found that point to two or 
more of the example authorities, such pages are also added 
to supplement the initial set Thereafter, in accordance with 
method 2 processing proceeds to step 293 where program 
flow is directed depending upon whether stop pages were 
specified in the query. In the case where no stop pages were 
specified in the query, program flow advances to step 288. 
However, if stop pages were specified, program flow in 
accordance with method 2 advances to step 295, at which the 
initial set is supplemented by deleting any specified stop 
pages. Thereafter, method 2 progresses to step 288. 

At step 288, method 2 includes steps for parsing; i.e. 
extracting, from the initial set of pages the links to other Web 
pages potentially relevant to the original query formed from 
the frame attributes the user entered. In accordance with the 
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invention, method 2 includes steps which seek to identify the Upon contemplation of FIG. 8, the logic underlying use of 

links available in the supplemented initially returned pages an initial search to aid in identifying pages of higher 

on the understanding and belief that those links include authority becomes clear. Given the observation that links 

intelligence, originally planned into the pages by the page represent intelligence in the pages identifying potential 

creators, which intelligence is likely to identify other pages 5 sources of authority concerning page subject matter, if pages 

of potentially greater authority concerning the search terms having relevance to the initial query are first identified with 

included in the initial query. search "hits" in conventional fashion, links associated with 
As will be appreciated, and as is known in the art, when *"Tl^ those hits are likely to identify potential sources of authority 



Web page creators craft pages, they typically include "links 3 
to other pages available on the Web which cither support, 
authenticate or otherwise relate to the subject matter 
included in their own pages. Accordingly, by identifying 
those links, method 2 of the present invention seeks to build 
on the assets in the form of links, provided by former page 
creators, as method 2 seeks to identify the higher levels of 



concerning page subject matter. That being the case, once 
the potential sources of authority; i.e., links, arc identified, 
it remains to, thereafter, optimize their potential identifica- 
tion of subject matter authority, and to subsequently filter 
spurious effects arising during the optimization process. 
However, while this approach at first blush may seem 
15 straightforward, it has been found necessary, as noted, above 



authority concerning the subject matter of interest. to identify the multiple sources of spurious effects which 

'P^ As noted above, links include, and, thereby, identify two introduced error into the optimization computation, and to 

V* ' types of pages which define the link relationship. The first, develop procedures and combinations of procedures for their 

the page that originates the link, termed a "hub"; i.e. the page removal. And, those identifications and developments have 

that points to another page of presumed potentially greater 20 by no means been immediately apparent, 

relevance on the subject matter, and the second, the page that Moreover, it is likewise not immediately apparent how 

receives the link termed the "authority." In accordance with effective the use of example pages and stop pages can be in 

method 2, though it is understood that not all links will directing return of high relevance pages, 

necessarily point to pages of higher authority on the subject ^ '^Returning to FIG. 14, following the parsing of links from 

matter of interest, it is recognized that such links constitute 25 the pages of the supplemented initial set at step 288, method 

a starting point from which pages of higher authority may be 2, thereafter, includes step 290 for retrieving pages linked in 

identified upon application of a proper sequence of refining the fashion depicted in FIG. 8 to pages of the supplemented 

steps in accordance with method 2. initial set. In preferred form, the procedures comprising step 

This relationship, and the procedures for supplementing 290 include employing a "crawler" or co mparable m eans, 

and expanding the initial set may better be understood with 30 well known in the art, for investigating the Web and retriev- 

reference to FIG. 8. In FIG. 8, the initial page set retrieved ing pages linked to pag es of the initial set. Additionally, 

is designated 300, and shown to include pages 302, 306 and method 2 in preferred form also includes steps for using 

310 coupled by links, represented as arrows, commonly preestablished reference libraries which id entify sources of 

designated 312. Additionally, initial set 300 is also seen to a uthority associated with the links io^tifle^inthe initial seT_ 

include, and be supplemented by pages 304 and 308 shown 35 In this regard, in preferred form, in order to find pages that 

with interrupted outlines to indicate they pages 304, 308 are linked to a particular page, method 2 employs search engines 

not within the initial set immediately following return of the that provide "inlink queries ", known in the art, to facilitate 

initial set. More specifically, page 304 is seen to be a page recovery. Additionally and in preferred form, method 2 may 

pointed to by an example hub represented by page 318, also maintain an index engine for providing inlinks and key 

denoted P 8, page 304 having been added to supplement the 40 word queries locally; i.e., in connection with the described 

initial set in accordance with method 2 as described. Further, identification process. 

page 308 is seen to be pointing to, two example authority Once method 2 automatically retrieves the pages for the 

pages 340, 342, respectively designated P 19 and P 20, thus, expanded set, the expanded pages are combined with the 

again having been added to supplement the initial set in pages of the supplemented initial set at step 292 shown in 

accordance with method 2 as described. Still further, though 45 FIG. 14. Continuing, upon completion of step 292, method 

for simplicity, stop pages have not been shown, it is to be 2 proceeds to step 297 where stop pages, if identified in the 

understood that in the case where stop pages are specified, query are again checked for deletion at step 299 in the event 

in accordance with method 2, they would have been they are brought in during expansion of the supplemented, 

removed from the initial set as the initial set is supple- initial set 

men ted. 50 Continuing, upon completion of step 299 shown in FIG. 

Continuing, FIG. 8 is also seen to include a group of 14, general step 272 in FIG. 12, for expanding the initial set 
extended pages 314 to 352 located not more than one fink into the root set is concluded, and method 2 advances to step 
away from the pages of the supplemented initial set, the 274 for automatically ranking the pages in terms of author- 
links, again, shown as arrows, commonly designated 312. As ity. As with step 272, step 274 shown in FIG. 12 is general 
will be appreciated, initial pages 302,306, and 310, supple- 55 in character and actually comprises a number of more 
mented with page 304 based on example hub 318 pointing specific steps introduced in FIG. 15. With reference to FIG. 
to page 304, and page 308 based page 308 pointing to at least 15, general step 274 is seen to comprise first, more specific 
two authority pages 340,342, as combined with extended step 360 for filtering the root set. 

pages 314 to 352 constitute the expanded; i.e., root, set of As pointed out above, sources of spurious effects 

pages designated 354. As shown in FIG. 8, pages 302, 306 so adversely affect computation of page ranking; i.e. authority 

and 310 of the initial set each include highlighted blocks determination. Moreover, in accordance with the invention, 

generally designated 356 representing occurrences of one or it has been found advantageous to suppress sources of 

more of the search query terms, referred to in the art as a spurious effects in a particular sequence, which sequence is 

"hits", and highlighted blocks generally designated 358 dependent upon the nature of the spurious effect sought to be 

indicating sources for links 312. In this depiction, "hub" 65 eliminated. Particularly, due to its character, "self- 

pages would be those at which a link arrow tail is located and promotion" has been determined to be a source of a spurious 

"authority" pages those at which a link arrow head is placed. effect which is advantageous to eliminate at the outset of 



10/14/2002, EAST Version: 1.03.0007 



US 6,356,899 Bl 

29 30 

computation. As explained, self-promotion arises from links 372 where the modified weights may be suitably filtered as 

between pages of the same Web site. Specifically, pages of required to reduce spurious factors, 

the same Web site have been discovered to artificially Continuing, upon completion of step 372 shown in FIG. 

conferring authority on each other. As will be appreciated, 16 method 2, concludes step 362 shown in FIG. 15, and 

method 2 is interested in links which it is believed a 5 advances to step 364 for iteratively determining page author- 

Web-page creator undertook to independently identify and lt y an£ j n ub scorcs _ 

include. As a result, where pages are believed identified ^ ^ othcr stcps> stcp 364 in mG 15 ^ gcncra i 
based on some bias; e.g., coming from the same source, they m cnaract er, aad ^ actually comprised of more specific steps 
may not meet the noted criteria, and accordingly, should be bcttcr secn ^ rcfcrencc to pjG. 17. As seen in FIG. 17, 
^avoided. 10 g Cncra l s t cp 354 f or iteratively determining page authority 
/ vfk / T ° ovcrcomc mis P roblcm mcthod 2 > deludes filtering ^ hub scorcs first mcludcs stcp 374 for updating hub and 
0 P roccdurcs at stc P 360 ^ own m nG 15 - Particularly, it has authority scores. As will be described, this "updating" step 
been found that self-promotion effects can be reduced if mcrud es both score recalculation and "false-authority" sup- 
links between pages of the same site are disregarded. And, pression procedures. 

still further, it has been found effective in implementing this 15 M described above, resource compilations such as book- 
procedure to define pages as being of the same site based on mark filcs> commonly contain pages pertaining to a number 
the nature of the site address. Specifically, in accordance of undated topics. Further, it has been found that because 
with method 2, pages arc defined as being from the same of miSj such compilations tend to falsely become good hubs, 
Web site if they satisfy the Mowing test: for class A and which irj turn causes irrelevant links from the same page to 
class B IP addresses, two pages are considered to be on the ^ falscly become good autnoritics> lms pro b\cm being referred 
same Web site if the two most significant octets of their t0 as "false-authority." To address this problem, it has been 
respective addresses match; for class C addresses, two pages dct ermined effective in reducing such false-authority effects 
arc considered to be on the same Web site if the three most m accordance with method 2, to provide remedial proce- 
significant octets of their respective addresses match; and for dures at mis point m me computation; i.e., during iterative 
class D addresses, two pages are considered to be on the ^ determination of hub and authority scores, 
same Web site if all four octets of their respective addresses Particularly, to reduce false-authority effects, method 2 

matcn * includes procedures for allowing each link in a page to have 

Following procedures for reduction of self-promotion its own hub value so mat me hub vahie of the page carj 

factors at step 360, method 2, as shown in FIG. 15, advances become a h mim 0 f the particular link rather than a 
to step 362 for generating page relevance weights. As in the ^ constant. Further method 2 includes procedures when corn- 
case of the steps shown in FIG. 12, step 362 seen in FIG. 15 puting aut hority values, for incrementing the authority of a 
is itself, general in character and, in accordance with method page with the hub value of ^ lmk which points to it Afldj 
2, includes a series of more specific steps. Particularly, and wnen co mpu ti n g hub values, method 2 includes procedures 
as shown in FIG. 16, step 362 for generating page relevance for using me authoritv value of the page pointed t0) to 
weights, first includes step 366 for generating a collection of 35 i ncrement the hub value of the link which points; i.e., acts 
nodes linked together by edges as described above. ^ a hub) ^ accord j ng t0 a spreading function, the hub 

Thereafter, at step 368 shown in FIG. 16, the authority values of ne i gh boring links. As a result of this, method 2 is 

weights and hub weights for respective pages are initially able t0 identify useful regions of a large hub page, and 

calculated m accordance with the relationship above d i minish the effects of irrelevant portions of the page. As 

described. ^ ^vill be appreciated, the final hub value of a page is the sum 

Particularly: where "S" is the root set and "E" the set of ^ ^ hub values of its ^ described m detail abovCt 

edges; i.e., hyperlinks, between pages m the root set, then, Continuing with reference to step 374, as will be 

if m-|E|, where m refers to links i, and n-|S|, where n refers appre ciated, the procedures for diminishing related-page 

to pages j, and "A" is an mxn matrix representing the weight effectSj as described abovej req uire that they be carried out 

of each link in connection with hub calculations, and "B" is 45 during a ^calculation of hub and authority scores, 

an n x m matrix representing the weight of each edge in Accordingly, step 374 in accordance with method 2 inte- 

connection with authority calculations, the weight compu- grates both pr0C edures for diminishing related-page effects 

tation can proceed if a is an n vector representing the ^ reca iculation of page hub and authority scores, 

authority value of each of the n pages and h is an m vector f ««S M ^ 0WQ in FIG 17 f oUo wing recalculation of page hub 

representing the hub value of each of the m hyperlinks. With f 0 ^ d authority scores, and suppression of related-page effects 

the above in mind, each round of iteration comprises the at step 374) prog ram flow advances to step 382 where the 

following three steps: results of step 374 are filrthcr fiUered to remove yet addi- 

1. Update authority scores: a*-Bh; tional sources of spurious effects before required recalcula- 

2. Update hub scores: h*-Aa; and tion is undertaken. As described, in addition to self-" 

3. Re-pack; i.e., re-compute authority; i.e., a. 55 promotion, related pages from the same Web site; e.g., a 
This process is repeated for as long as necessary to home page and several sub-pages of the home page, can 

achieve the desired result. In preferred form, five such steps improperly accumulate authority weights, giving rise to 

have been found sufficient. spurious factors in the form of "related-page" promotion, 

Thereafter, as shown in FIG. 16, the method 2 advances which adversely affects relevance computation accuracy. In 

to step 370 where in accordance with the invention, the eo accordance with the invention, it has been found effective to 

weight computations are modified to include amplification reduce related-page effects at this point in the method; i.e., 

factors, as previously described, to note potential relevance following step 374 and before iteration of recalculation for 

of a link to the original in the computation, as for example, convergence. To accomplish this, method 2 includes proce- 

"context relevance", "replication relevance" and "example dures at step 382 after iteration of the computation, for 

relevance" as described above. 65 re-packing the authority of all sites. Specifically, step 382 

Following computation of the modified page weights at includes procedures for setting to 0 all, but the page with the 

step 370, method 2, as shown in FIG. 16, advances to step largest authority of the same site, the same site being defined 
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in the fashion described in connection with the self- however, with respect to hubs, method 2 performs the 
promotion filtering procedure. following filtering. Method 2 begins by supply outputting 

Following completion of step 382, method advances to the best hub, and then diminishes the authority scores of 
step 376, where method 2 determines whether hub and everything that hub points to. Given the new scores, method 
authority scores have converged toward final values suffi- 5 2 recomputes hub values. Thereafter, method 2 again outputs 
ciently to forego further recalculation. As will be me best hub, and diminishes the scores of pages it points to, 
appreciated, the interdependence of page hub and authority and thereafter, continues this process until completion; i.e., 
weights causes page hub and authority scores to reach final the a number of hubs acceptable to the user have been 
values suitable for ranking purposes; i.e., converge returned. 

adequately for purposes of ranking in accordance with 10 Thereafter, following procedures for diminishing the 
method 2. In this regard it is to be noted that the exact value effect of redundant hubs, method 2 advances to step 388, 
of hub and/or authority value for a page is not as significant where as described above, the number of pages in the root 
as approximating the page's respective hub and authority set is limited to the value entered by the user at field 191 of 
relative values for ranking purposes. Therefore, in accor- interface editing partition 190, 

dance with the invention, method 2 includes procedures for 15 Following execution of step 388, method 2, thereby, 
establishing a criteria, which in preferred form includes concludes step 278 of FIG. 12 for truncating the ranked 
performing the iteration 5 times. Accordingly, if final values pages and advances to step 278 where the truncated ranked 
have not been reached, program flow proceeds over branch root set of highest hub and authority pages are returned for 
380 to iterate; i.e. to step 374 over program flow branch 384 populating the selected frame as shown in FIG. 12, and as 
to step 374 for recalculation of hub and authority scores as 20 shown in FIG. 9, concluding step 8. Finally, upon comple- 
previously described. Thereafter, once the required itera- tion of frame populating step 8, subject to method 2 receiv* 
tions have been accomplished and final values reached, ing no indication of any remaining frames to be populated at 
program flow exits step 376 at branch 378. step 10 and no indication of further modifications to the 

Following procedures for reaching acceptable final*? frame hierarchy at step 16, program flow advances over 
values, program flow advances over branch 378, thereby, /25 branch 20 to finish. 

\cpi) With regard to operation of method 2 in a typical 
V application, if a user wished to develop a set of high- 
authority information pages relating to the a particular 
question; as for example, the development of Web mforma- "j 
30 ti on pages concerning the restoration of a BMW of interestj 
tcf the user, the method would proceed as follows. 
The user would initially activate his personal computer 68 



concluding step 364 shown in FIG. 15 for iteratively deter- 
mining page hub and authority scores, and accordingly also 
concluding step 274 shown in FIG. 12 for ranking pages of 
the root set by authority, it being understood that the results 
of the iterative recalculation provide page number identifi- 
cations ordered by page authority and hub score values; i.e., 
a distribution of scores that represent the degree of relevance 



for the respective pages, which scores are ordered by I shown in FIG. 1, call up method 2 as embodied in a software 
numerical amount to establish ranking of the pages. application stored at user computer 68, and when prompted 

Continuing with reference to FIG. 12, upon completion of 35 at method step 80 shown in FIG. 10, identify an information 
step 274 for ranking the pages of the root set, method 2 organizational structure such as structure 100 shown in FIG. 
advances program flow to step 278 for truncating the ranked, 2. 

root set pages i.e., reducing the set size to the number of (Wjnupon the users identification of structure 100, method 2 
pages desired by the user. It will be recalled that in connec- ^ would generate screens^iiO.ofinleriaceJLSS^at users monitor 
tion with description of interface screens 176 and 202, 40 132 shown in FIG. 5. Thereafter, when presented with screen 
method 2 in preferred form includes steps for providing 140, the user could interactively select frame addition tool 
editing partition 190 with entry field 191 in which the user 148 at partition 144 and include a new information frame 
may specify the number of pages he would like returned 128 at structure 100 shown in screen parU*uon~T427Tn 
following automated determination of pages having the accordance with the method, before entering the proposed 
highest authority relative to the frame attributes of the frame 45 new frame, the user would be free to review structure 100 to 
selected for population. Method 20 at step 278 includes determine where the new frame concerning "BMW resto- 
procedures for the effecting the user's designation of pages ration'* would best be placed. Following that review, the user 
to be returned, which designation the user enters at field 191 might judge that since the subject of BMW Restoration*) 
of interface partition 190. applied more to "Business" , than to either "Entertainmen t", I 

As in the case of other steps, method step 278 shown at 50 "S cience", or "fjjw sll, it would better be placed somewhere | 
FIG. 12, is generally in character, and includes more specific beneath frame 108 in the hierarchy. Further upon additional] 
steps seen in greater detail in FIG. 18. As presented in FIG. review, the user would likely judge that BMW Restoration 
18, general step 278 first includes step 386 for filtering the would better fall under the subcategory "Companies" at — I 
ranked pages of the root set. frame 116, than under the subcategory "Finance" at frame^J 

As noted above with respect to "redundant hubs," the 55 118, and indeed under the frame "Products &. Servic es" at 
value of a hub page is, by definition, in its links rather than frame 122 rather than "Com puTers" at iram e 12 0. 
its contents, i.e., "better" hubs having greater numbers of Accordingly, the user would "likely .place proposed new 
links to authority pages. Accordingly, if all the destinations frame 128 for BMW Restoration beneath frame 122 con- 
accessible from a particular hub are also accessible from cerning "ProductsA Services" in structure 100. Of course, 
"better" hubs; i.e., hubs with greater numbers of links, that 60 and as noted above, in view of the flexibility afforded by 
particular hub need not be outputted. Since method 2 seeks method 2, the user could place proposed new frame 128 
to provide the smallest number of hub pages that together anywhere in structure 100 that he liked on the understanding 
contain as many, unique, high-quality links as possible, that new frame 128 would be subject to inclusion of any 
method 2 accordingly includes procedures for removing "parent" frames it was associated with, and compatibility of 
redundant hubs. Specifically, method 2 at step 386 shown in 65 its attributes with any frames which depended from it 
FIG. 18 includes procedures to iteratively generate hubs and Following placement of new frame 128, the user would 

authorities. Method 2 outputs the authorities as they stand, identify the attributes to be associated with the frame, 
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particularly, classification descriptors, example pages and/or . 
stop pages as described previously. More specifically, the 
user might identif y pages of known BMW user, flroupsJhat 
include hubg ^onl^ gtoKnown B MW-jestQr atioa facilities 
and/or techniques. Additionally, the user might identify 5 
known authority pages concerning case studies of BMW 
restorations. Further, the user might enter stop pages con- 
cerning BMW parts or pages concerning sale of restored 
BMWs known not to be of relevance. —J 

In a case at hand, however, for purposes of simplicity of 10 
illustration, the frame attributes would be merely the 
descriptors employed by the user to identify the frame, 
specifically, the frame title, here "BMW Restoration." 

Next, tie user would navigate to the screen 176 of 
interface 138, as for example, by selecting the display mode 15 
"Frame" 172 at region 168 of partition 152 shown in FIG. 
6. At screen 176, while partition 178 would not include lists 
180 and 182, having respectively, authority and hub pages, 
no population procedure having yet been undertaken, the 
user, nonetheless, would be presented with editing partition 20 
190 at which he could modify the frame attributes; e.g. by 
providing entry of frame attributes to be included at field 
194, or attributes to be excluded at field 196. 

Following finalization of the frame attributes, and desig- 
nation of the number of pages the user would like to 25 
populate new frame 128 with at field 191, and on the 
assumption that no modification to the frame structure was 
desired, the user would provide authorization for method 2 
to automatically populate frame 128 with a set of ranked 
pages directed to the frame attributes, specifically, "BMW 30 
Restoration." 

Utere after, method 2 would undertake automatic formu- 
lation of a query based on the frame attributes, particularly, 
"BMW" and "Restoration" as shown generally at step 270 of 
FIG. 12. Next, method 2 would retrieve an initial set of 35 
pages 300 as shown in FIG. 8, including "hits" 356 for the 
respective search terms "BMW" and "Restoration" and links 
312 to related pages as shown at pages 302 to 310 of FIG. 
8. 

Following identification and retrieval of the initial set, 40 
method 2 would proceed to step 272 as shown in FIG. 12 to 
expand initial set 300 to root set 354. As described, method 
2 would accomplish expansion of the initial set by parsing 
the links from initial pages 304 to 310 and employing means 
such as crawlers and link libraries to identify pages located 45 
one link away from the respective pages of initial set 300, to 
include pages 314 to 352 as seen in FIG. 8. 

Once method 2 has expanded the initial set to the root set 
as shown in FIG. 12 at step 272, method 2 would proceed to 
step 274 for ranking the pages of the root set by authority as 50 
explained above. And, thereafter, method 2 would advance 
to step 276 shown in FIG. 12 where the ranked pages of the 
root set would be truncated i.e. reduced in accordance with 
the specification provided by the user at entry field 191 of 
editing partition 190 above described. 55 

And, once the pages of the root set ranked by authority 
were reduced to the requested number; i.e. truncated, 
method 2 would return the pages as set 130 shown in FIG. 
4 to populate selected frame 128, "BMW Restoration." 

Following return of the population of pages to frame 128, 60 
the user at interface 138 and associated screens 176 and 202 
shown in FIGS. 6, 7, could review lists 180 and 182 of, 
respective, authority and hub pages and their contact to 
determine whether the page population was acceptable, or 
whether, modifications should be entered at editing partition 65 
190 to the frame attributes to, thereby, modify the frame 
population with a new population, again retrieved as above 
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described. As will be appreciated, this interactive and itera- 
tive process would continue at the user's discretion until the 
page population provided at frame 128 was acceptable. Once 
the page population was found acceptable, the user could 
then terminate method 2. 

While this invention has been described in its preferred 
form, it will be appreciated that changes may be made in the 
form, procedure and sequences of its various steps and 
elements without departing from its spirit or scope. 

What we claim is: 

1. A method for cataloging and ranking information 
comprising the steps of: 

a. enabling a user to interactively define a structure for 
cataloging information in one or more categories, each 
category having one or more attributes for defining the 
category; 

b. selecting a category of the structure; 

c. identifying a population of information elements for the 
selected category automatically based upon the respec- 
tive category attributes; 

d. ranking the information elements for the particular 
category automatically based upon relevance to cat- 
egory attributes; and 

e. populating the selected category with ranked informa- 
tion elements. 

2. The method of claim 1 further including filtering the 
information elements to improve ranking accuracy. 

3. The method of claim 2 wherein filtering includes 
applying filtering at predetermined points during ranking. 

4. The method of claim 3 wherein identifying information 
elements includes identifying links between information 
elements, and ranking information elements includes itera- 
tively determining relevance of the information elements to 
the respective category attributes based upon the affinity 
between information elements, and wherein one or more 
sources of spurious effects arise during the determining of 
relevance which adversely affect ranking accuracy, and 
wherein filtering includes steps for diminishing one or more 
sources of spurious effects. 

5. The method of claim 4 wherein the sources of spurious 
effects include self-promotion and filtering includes steps for 
diminishing the effects of self promotion. 

6. The method of claim 5 wherein the filtering steps for 
diminishing the effects of self -promotion include discarding 
links from information elements of a first source of infor- 
mation elements to information elements of a second source 
of information elements, where the first information source 
and the second information source are the same. 

7. The method of claim 6 wherein the information ele- 
ments are pages of the World Wide Web, and information 
sources for the pages are Web sites, the Web including at 
least four classes of Web sites designated, respectively, class 
A, class B, class C and class D, each respective class having 
a Web address including four octets of identification, and 
wherein Web sites are considered the same where: the two 
most significant octets of the addresses for class A and class 
B sites are the same; and the three most significant octets of 
the addresses for class C sites are the same; and all four 
octets of the addresses for class D sites are the same. 

8. The method of claim 6 wherein the filtering steps for 
diminishing the effects of self-promotion are applied proxi- 
mate the beginning of determining information element 
ranking. 

9. The method of claim 4 wherein the information ele- 
ments include hubs and authorities, and the sources of 
spurious effects include redundant hubs, and wherein filter- 
ing includes steps for diminishing the effects of redundant 
hubs. 
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10. The method of claim 9 wherein the filtering steps for 20. The method of claim 1 wherein enabling a user to 
diminishing the effects of redundant hubs includes identi- define an information structure includes providing a display 
fying a hub having the highest number of links at appro xi- interface having fields at which the user can enter attributes 
mately the end of a ranking iteration, and setting to zero the to be associated with the information structure categories, 
authority values of information elements that are linked to 5 21. The method of claim 20 wherein identifying a popu- 
by the hub having the highest number of links, re-computing lation of information elements includes automatically gen- 
hub values and iterating determination of ranking. erating a search query based upon the category attributes 

11. The method of claim 9 wherein the filtering steps for entered at the interface. 

diminishing the effects of redundant hubs are applied proxi- 22. The method of claim 21 wherein providing the inter- 
mate the end of determining information element ranking. 10 face includes providing the interface with one or more 

12. The method of claim 4 wherein the information screens, respectively, having one or more partitions, 
elements include hubs and authorities, and the sources of 23. The method of claim 22 wherein providing the inter- 
spurious effects include false authority, and wherein filtering face with one or more screens, respectively, having one or 
includes steps for diminishing the effects of false authority. more partitions, includes providing at least one partition for 

13. The method of claim 12 wherein the filtering steps for 15 enabling modification of category attributes, 
diminishing the effects of false authority includes allowing 24. The method of claim 23 wherein enabling modifica- 
each link in an information element to have its own hub tion of category attributes includes providing an entry field 
value, such that the hub value of the information element permitting the user to add attributes. 

becomes a function of the particular link, and, when ranking 25. The method of claim 24 wherein enabling modifica- 

authorities, allowing a value associated with the authority to 20 tion of category attributes includes providing an entry fields 

be incremented by the hub value of the link, and when permitting the user to delete attributes, 

ranking hubs, allowing, the value of the authority linked to, 26. The method of claim 25 wherein enabling modifica- 

be used to increment the hub value. tion of category attributes includes providing an entry field 

14. The method of claim 12 wherein the filtering steps for permitting the user to exclude attributes, 

diminishing the effects of false authority are applied during 25 27. The method of claim 26 wherein enabling modifica- 

the determining of information element ranking. tion of category attributes includes permitting the user to 

15. The method of claim 4 wherein the information select predefined attributes with which known information 
elements include hubs and authorities, and the sources of elements are associated. 

spurious effects include related-page factors, and wherein 28. The method of claim 22 wherein providing the inter- 
filtering includes steps for diminishing the effects of related- 30 face with one or more screens having one or more partitions, 
page factors. includes providing at least one partition for presenting the 

16. The method of claim 15 wherein the filtering steps for information structure. 

diminishing related-page factors includes, re-packing the 29. The method of claim 28 wherein providing the inter- 
authority of any source of information elements prior to face with at least one partition for presenting the information 
iterating a determination of ranking by setting to zero all 35 structure includes providing at least one partition for pre- 
authority values of the same information element source, sen ting a graphical representation of the information struc- 
except the largest authority value. ture and for enabling modification of the information struc- 

17. The method of claim 16 wherein the information ture. 

elements are pages of the World Wide Web, and information 30. The method of claim 29 wherein enabling modifica- 

sources for the pages are Web sites, the Web including at 40 tion of the information structure includes enabling the 

least four classes of Wed sites designated, respectively, class adding, deleting, and moving of categories within the infor- 

A, class B, class C and class D, each respective class having mation structure. 

a Web address including four octets of identification, and 31. The method of claim 22 wherein providing the inter- 
wherein Web sites are considered the same where: the two face with one or more screens having one or more partitions, 
most significant octets of the addresses for class A and class 45 includes providing at least one partition for presenting one 
B sites are the same; and the three most significant octets of or more information elements having information content 
the addresses for class C sites are the same; and all four which populate a selected category of the information struc- 
octets of the addresses for class D sites are the same. ture. 

18. The method of claim 15 wherein the filtering steps for 32. The method of claim 31 wherein providing the inter- 
diminishing the effects of related-page factors are applied 50 face with one or more screens having one or more partitions, 
during the determining of information element ranking. includes providing at least one partition for presenting the 

19. The method of claim 4 wherein the sources of spurious contents of a selected information element. 

effects include self -promotion; redundant hubs; false author- 33. The method of claim 22 wherein providing the inter- 
ity; and related-page factors; and filtering includes steps for face with one or more screens having one or more partitions, 
diminishing the spurious effects by combining steps for 55 includes enabling navigation between screens, and further 
reducing self -promo tion, redundant hubs; false authority and includes providing at least one partition for presenting the 
related page factors, and wherein the filtering steps for information structure and at least one partition for enabling 
diminishing the effects of self-promotion are applied proxi- modification of category attributes, 
mate the beginning of determining information element 34. The method of claim 33 wherein providing the inter- 
ranking; filtering steps for diminishing the effects of redun- 60 face with one or more screens having one or more partitions 
dant hubs are applied proximate the end of determining includes providing at least one partition for presenting one 
information element ranking; filtering steps for diminishing or more information elements having information content 
the effects of false authority are applied during the deter- which populate the selected category of the information 
mining of information element ranking and filtering steps structure. 

for diminishing the effects of related -page factors are 65 35. The method of claim 34 wherein providing the inter- 
applied during the determining of information element rank- face with one or more screens having one or more partitions 
ing prior to iteration of ranking determinations. further includes providing at least one partition for present- 
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ing the content of an information elements which populates 
the selected category of the information structure. 

36. A method for cataloging ranking and filtering infor- 
mation comprising the steps of: 

a. presenting a display interface to a user, the interface 5 
having one or more screens, respectively, having one or 
more partitions for enabling the user to interactively 
define a structure for cataloging information in one or 
more categories, each category having one or more 
attributes for defining the category; 1Q 

b. enabling the user to select a category of the structure; 

c. automatically identifying a population of information 
elements for the selected category based upon the 
respective category attributes designated by the user, 
and further, identifying links between information ele- 
merits; 

d. ranking the information elements for the particular 
category automatically based upon relevance to cat- 
egory attributes by iteratively determining relevance of 
the information elements to the respective category 
attributes based upon the affinity between information 20 
elements; wherein one or more sources of spurious 
effects arise during the determining of relevance which 
adversely affect ranking accuracy; 

e. filtering the information elements to improve ranking 
accuracy by applying filtering at predetermined points 25 
during ranking; and 

f. populating the selected category with ranked informa- 
tion elements. 

37. The method of claim 36 wherein providing the inter- 
face with one or more screens having one or more partitions, 30 
includes providing at least one partition for presenting the 
information structure and at least one partition for enabling 
modification of category attributes and at least one partition 
for presenting one or more information elements having 
information content. 35 

38. The method of claim 37 wherein the sources of 
spurious effects include self-promotion and filtering 
includes steps for diminishing the effects of self -promotion 
applied proximate the beginning of determining information 
element ranking. 40 

39. The method of claim 37 wherein the sources of 
spurious effects include redundant hubs and filtering 
includes steps for diminishing the effects redundant hubs 
applied proximate the end of determining information ele- 
ment ranking. 45 

40. The method of claim 37 wherein the sources of 
spurious effects include false authority and filtering includes 
steps for diminishing the effects of false authority applied 
during the determining of information element ranking. 

41. The method of claim 37 wherein the sources of 50 
spurious effects include related-page factors and filtering 
includes steps for diminishing the effects of related-page 
factors applied during the determining of information ele- 
ment ranking. 
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42. A method for displaying an interface at which a user 
may interactively develop a structure for cataloging infor- 
mation in one or more categories, each category having one 
or more attributes for defining the category, and at which 
interface the user may automatically populate selected cat- 
egories with information elements, the method comprising 
the steps of; 

a. providing the interface with one or more screens, 
respectively, having one or more partitions; 

b. providing at least one partition for presenting the 
information structure and enabling its modification; 

c. providing at least one partition for presenting the 
attributes of a selected category and enabling modifi- 
cation of category attributes; and 

d. enabling automatic identification of a population of 
information elements with a search query based upon 
the category attributes entered at the interface. 

43. The method of claim 42 wherein providing the inter- 
face includes providing fields at which the user can enter 
attributes to be associated with the information structure 
categories. 

44. The method of claim 43 wherein providing the inter- 
face with one or more screens having one or more partitions, 
includes enabling navigation between screens and providing 
at least one partition for presenting one or more information 
elements having information content which populate a 
selected category of the information structure. 

45. The method of claim 43 wherein providing the inter- 
face with one or more screens having one or more partitions 
further includes providing at least one partition for present- 
ing the content of an information elements which populates 
the selected category of the information structure. 

46. The method of claim 45 wherein enabling modifica- 
tion of category attributes includes providing an entry field 
permitting the user to add attributes. 

47. The method of claim 45 wherein enabling modifica- 
tion of category attributes includes providing an entry fields 
permitting the user to delete attributes. 

48. The method of claim 45 wherein enabling modifica- 
tion of category attributes includes providing an entry field 
permitting the user to exclude attributes. 

49. The method of claim 45 wherein enabling modifica- 
tion of category attributes includes permitting the user to 
select predefined attributes with which known information 
elements are associated. 

50. The method of claim 45 wherein enabling modifica- 
tion of the information structure includes enabling the 
adding, deleting, and moving of categories within the infor- 
mation structure. 

***** 
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